Google 



This is a digital copy of a book that was preserved for generations on library shelves before it was carefully scanned by Google as part of a project 

to make the world's books discoverable online. 

It has survived long enough for the copyright to expire and the book to enter the public domain. A public domain book is one that was never subject 

to copyright or whose legal copyright term has expired. Whether a book is in the public domain may vary country to country. Public domain books 

are our gateways to the past, representing a wealth of history, culture and knowledge that's often difficult to discover. 

Marks, notations and other maiginalia present in the original volume will appear in this file - a reminder of this book's long journey from the 

publisher to a library and finally to you. 

Usage guidelines 

Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the 
public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing tliis resource, we liave taken steps to 
prevent abuse by commercial parties, including placing technical restrictions on automated querying. 
We also ask that you: 

+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for 
personal, non-commercial purposes. 

+ Refrain fivm automated querying Do not send automated queries of any sort to Google's system: If you are conducting research on machine 
translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the 
use of public domain materials for these purposes and may be able to help. 

+ Maintain attributionTht GoogXt "watermark" you see on each file is essential for in forming people about this project and helping them find 
additional materials through Google Book Search. Please do not remove it. 

+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just 
because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other 
countries. Whether a book is still in copyright varies from country to country, and we can't offer guidance on whether any specific use of 
any specific book is allowed. Please do not assume that a book's appearance in Google Book Search means it can be used in any manner 
anywhere in the world. Copyright infringement liabili^ can be quite severe. 

About Google Book Search 

Google's mission is to organize the world's information and to make it universally accessible and useful. Google Book Search helps readers 
discover the world's books while helping authors and publishers reach new audiences. You can search through the full text of this book on the web 

at |http: //books .google .com/I 



RIVERSIDE TEXTBOOKS 
IN EDUCATION 

EDITED BY ELLWOOD P. CUBBERLEY 

PROFESSOR OF EDUCATION 
LELAND STANFORD JUNIOR UNIVERSITY 



DIVISION OF SECONDARY EDUCATION 

UNDER THE EDITORIAL DIRECTION 

OF ALEXANDER INGLIS 

ASSISTANT PROFESSOR OF EDUCATION 
HARVARD UNIVERSITY 



o .:i:izxiiii:::n:zii:x:D::r:n:i;:: 



EDUCATIONAL TESTS AND 
MEASUREMENTS 

WALTER SCOTT MONROE, Ph.D. 



ASSISTED BY 

JAMES CLARENCE DeVOSS, A.M. 



FREDERICK JAMES KELLY, PH.D. 



UNIVERSITY O 




HOUGHTON MIFFLIN COMPANY 

BOSTON NEW YORK CHICAGO 

ff be DiUrrrii&e ptein CamMHt 






COPYRIGHT, 191 7, BY W. S. MONROE, J. C. DBVOSS AND F. J. KBLLY 



ALL RIGHTS RESERVED 



247999 



• • ^» • • 

• • • • •• 

• • • * • • 

• • • • • • 

• • • • 









• • • '• 






V fc . \, I. I V. 



V w 



CAMBRIDGE . MASSACHUSETTS 
U. S . A 




EDITOR'S INTRODUCTION 

Up to very recently our chief method for determining the 
efficiency of a school system was the method of personal 
opinion. When the work of a superintendent of schools 
was called in question, the schools were visited and personal 
opinions expressed as to their standing, In case of a dis- 
agreement among the visitors the efficiency became a matter 
of dispute, and the people of a community usually favored 
the opinion which most nearly coincided with their preju- 
dices and preconceived ideas. 

Relatively recently the method of comparison was intro- 
duced. By means of this method the school system under 
consideration is compared with other school systems of the 
same size and class, and with reference to a number of differ- 
ent items. After such a comparison has been made, it is 
possible to place the school system relatively. If the school 
system studied stands fourth out of twenty school systems 
compared in one item, thirteenth in another, and at the 
bottom of the list in three others, it is not difficult to deter- 
mine its position. It is evident that this is a much better 
method than the one of personal opinion. Its chief defect, 
though, lies in that the school system studied is continually 
compared with the average or median of its size and class. 
In other words, the school system is continually measured 
as against mediocrity, when as a matter of fact the average 
or median school system may not represent a good school 



r 



vi EDITORS INTRODUCTION 

system at all. Perhaps all of the school systems below the 
average or median should be classed as poor school systems, 
and even some of those above are not doing what a school 
system should do. 

Still more recently, and wholly within the past decade, a 
still better method for the evaluation of the work which 
teachers and schools are doing has been evolved. This new 
method consists in the setting up, through the medium of a 
scries of carefully devised " Standardized Tests," of standard 
measurements and units of accomplishments for the deter- 
mination of the kind and the amount of work which a school 
or a school system is doing. This new movement is as yet 
almost in its infancy, but so important is it in terms of the 
future of school administration that it already bids fair 
to change, in the coiirse of time, the whole character of this 
professional service. 

The significance of these new standards of measurement 
for our educational service is indeed large. Their use means 
nothing less than the ultimate transformation of school 
work from guesswork to scientific accuracy; the elimination 
of favoritism and politics from the work; the ending for- 
ever of the day when a personal or a pofitical enemy of 
a superintendent can secure his removal, without regard 
to the efficiency of the school system he has built up; the 
substitution of wetl-trained esjjerts as superintendents of 
schools for the old successful practitioners; and the chang- 
ing of school supervision from a temporary or a political 
job, for which little or no preparation need be made, to that 
of a highly skilled piece of social engineering. 

This new method for the evaluation of the work which a 




EDITOR'S INTRODUCTION vii 

school system is doing is so important that anyyoung man or 
woman of to-day who desires to prepare for school adminis- 
tration should by all means thoroughly famiUarize himself 
or herself with tlie aims and methods of this new type of 
administrative procedure. The underlying purpose of the 
new movement has been the creation of such standardized 
scales for measuring school work, and for comparing the 
accomplishments of different schools and groups of school- 
children, as to give to both supervisors and teachers definite 
aims in the imparting of instruction. Instead of continuing 
to teach without definite measuring-sticks, and to assign 
ta^ks and trust to luck and the growth process in children 
for results, which is comparable to the old-time luck-and- 
chance farming, it has been attempted to evolve standards 
of measurement which will do for education what has been 
done for agriculture as a result of the application of scien- 
tific knowledge and scientific methods to farming. 

Such an important new movement is of especial signifi- 
cance to the teacher in charge of a class, to the citizen in- 
terested in schools, and to the superintendent responsible 
for resiUts. 

To the teacher it cannot help but eventually mean not 
only concise and definite statements as to what she is ex- 
pected to do in the different subjects of the course of study, 
but the reduction of instruction to those items which can 
be proved to be of importance in preparation for intelli- 
gent Uving and future usefulness in life. It will mean, too, 
an ultimate differentiation in training for the different 
types of children with which teachers now have to deal, 
and the speciahzation of work so as to enable teachers to 



viii EDITORS INTRODUCTION 

obtain more satisfactory individual results. To the citizen 
the movement means the erection of standards of accom- 
phshment which are definite, and by means of which he can 
judge for himself as to the efficiency of the schools he helps 
to support. For the snpcrintendent it means the changing 
of school supervision from guesswork to scientific accuracy, 
and the establishment of standards of work by which he 
may defend what he is doing. 

Up to the present time nearly all of the work which has 
been done in the evolution and testing out of these new 
standardized tests has been work of a highly scientific and 
technical nature, most of the articles being written in a 
language which the layman can scarcely understand. Often 
no interpretation has been attempted of the results which 
have been obtained. The classroom teacher and the school 
principal have naturally not found these studies of much 
help to them in their work. 

This work has been carried far enough, however, so that 
the time now seems ripe for a clear and simple statement 
as to the nature of the different tests which have been 
evolved, their use, their reliability, what are the best stand- 
ard scores so far arrived at, and, in particular, how to 
diagnose the results and apply remedial instruction. This 
the three authors of the present volume in the series have 
attempted to give, and, to make their work of the largest 
possible usefulness to normal-school students, teachers, and 
principals of schools, they have cast the whole in lan- 
guage so simple and untechnical that the average grade 
teacher can read the book and understand it. In addition, 
to give still larger value to the book, they have added a 



EDITOR'S INTRODUCTION ix 

number of chapters, written in a similar simple and read- 
able style, giving the essential elements needed in under- 
standing simple statistical methods, the meaning of scores, 
the unreliability of school marks and their relation to stand- 
ardized scores, and the use of the standardized tests in 
the work of school supervision. 

No space has been taken up in merely reproducing the 
tests themselves, though samples, showing their nature, 
have been inserted. If it is desired to use the tests with a 
class, they will be needed in quantities, and they may then 
be obtained in quantities and for very small sums from the 
persons and at the places mentioned in the chapter biblio- 
graphies. These bibUographies also give the most impor- 
tant book or article describing in detail the construction 
and use of the tests, in case the worker desires to go further 
than this volume presents. Instead, the authors have used 
their space in explaining to teachers and school officers the 
nature of the tests, telling how to give and score them, what 
standings the pupils should attain in their use, and presenting 
a rather full description as to the significance of the results 
obtained and how to remedy the defective conditions which 
the use of the tests reveals. Inconsequence, the book should 
prove of much use not only to students in normal schools 
and colleges, but to teachers and principals in our public 
schools as well. The style and contents of the volume are 
such as also to adapt it to reading-circle study with teachers, 
or to the needs of the average citizen interested in knowing 
something as to the nature and uses of the Standardized 
Tests. 

Ellwood p. Cubberley. 



PREFACE 

This book is designed primarily for teachers. It is based 
on two years' experience in giving a course on educational 
measurements to prospective teachers in a state normal 
school and on the experience received from directing a 
Bureau of Educational Measurements and Standards. 

It is just twenty years since Rice startled the educators 
of this country by his proposal that the results of teaching 
spelling could be measured by a spelling test. His proposal 
was greeted with sarcasm and ridicule, but during the past 
two decades the opposition to the principle of educational 
measurements has almost entirely disappeared. To-day 
the widespread use of standardized tests and scales bears 
witness to the importance of this movement in American 
education. However, it is profitable to analyze our pres- 
ent interest in educational measurements. A thing may 
be interesting merely because it is new and spectacular. 
Scores are objective and are subject to graphical represen- 
tation. A chart displayed attracts attention. Evidence is 
not wanting to show that a considerable number of teachers 
look upon educational measurements merely as an interesting 
topic for teachers' meetings or as a means of attracting 
attention in their community. 

Standardized tests and scales are not "playthings." 
Neither are they teaching devices. They are instruments 
which furnish the teacher (1) with detailed and definite 



xii TREPACE 

aims, and (S) with a means for diagnosing the teaching 
situation which she faces. Unless the diagnosis is followed 
by remedial instruction the use of standardized tests and 
scales cannot be of much vaJue. They become mere "play- 
things." 

Our present testa are probably crude instruments, but 
the first railway locomotive was also crude. Even now stand- 
ardized tests and scales are superior to ordinary examina- 
tions, but, more important, their use tends to engender in 
the teacher a type of thinking about her work which is very 
helpful. By using them she recognizes objective standards 
to be attained and not to be exceeded, the present achieve- 
ments of her pupils, and that instruction must be suited to 
the needs of her pupils. .When a teacher comes to think 
of her teaching problem in these terms, she is in a position 
to increase greatly her efficiencj'. 

This book is addressed to teachers because they are 
charged with the instruction of pupils. The superintendent, 
principal, or student of education who is interested in the 
teacher's work also will find much of value in the book. 
Technical details of the derivation of tests are not given, 
but references are given so that one interested may pur- 
sue the n>otter. These were omitted because they are not 
essential to the use of the tests by teachers. For much the 
same reason the criticism of tests is made a secondary mat^ 
ter. The detailed criticism of tests and the derivation of 
improved ones must be left to the expert. The teacher 
needs to know only enough to enable her to choose wisely 
in selecting a test, and to prevent her from ascribing to the 
scores a signifiumce which is not justified. 



PREFACE xiii 

The newness of the field and the rapidity with which it 
is developing places limitations upon an attempt to write a 
text. It is recognized that probably before this volume is 
printed new tests will have been announced. However, the 
author believes that the point of view upon which the book 
is based is not merely temporary, and that, as new tests are 
available, the fundamental principles of the book may be 
applied to them. 

It is obvious that in an endeavor such as this one must 
utilize the results obtained by many investigators. In fact 
it is hoped that this book may have the virtue of summa- 
rizing these results. The author is keenly aware of his 
obligation to all whose work is mentioned in the follow- 
ing pages. Special mention shojild be made of Professor 
DeVoss, who contributed the chapter on "Handwriting." 
and of Dean Kelly, who wrote the chapter on "Reading." 

Walter S. Monroe. 

EifPOBiA, Kansas, April 27, 1917 



CONTENTS 

CHAPTER I. The Inaccuracy op Pbesent School 
Marks 1 

School marks — The inaccuracy of teachers' marks — Carter's 
investigation — Kelly's investigation — Conclusions from these 
studies — Johnson's investigation — Marking examination papers 
— Distribution of marks — Error due to unequal value of ques- 
tions — Another example of unequal values — Rate of doing 
work neglected — Wide range of topics included within an ex- 
amination — Most valuable topics for education — Questions 
and topics for investigation. 

CHAPTER n. Arithmetic 17 

I. The Problem of measurino Arithmetical Abilities. • 
Arithmetical abilities automatic or habits — Arithmeti- 
cal abilities distinct — Separate types in handling integers 
— Each a specific habit — Why we need to use arithmeti- 
cal tests. 

U. Standardized Tests for measuring Arithmetical Abili- 
ties. 

1. The Courtis Standard Research Tests, Series B — Mark- 
ing the papers. 

2. The Cleveland-Survey Arithmetic Tests — Spiral nature 
of these tests — Nature of the tests. 

8. The Woody Arithmetic Scales — Measurement by means 
of a scale — The addition scale. 

4. Research Tests in Addition of Fractions. 

5. The Stone Reasoning Test. 

6. Other reasoning tests. 

in. Standard Scores. 

1. Courtis Standard Research Tests, Series B — Courtis 
Standard scores. 

2. The Cleveland-Survey Tests — Cleveland and Grand 
Rapids scores. 

8. The Woody Arithmetic Scales. 

4. The Addition of Fractions Tests. 

5. The Stone Reasoning Test. 

The accuracy of individual scores — Gain or loss in re- 
peating tests. 

XV 



CONTENTS 

rV. How TO HANDLE WHAT THE TeSTB REVEAL. 

Sdenlific management — Diagnosis of the teaching at- 
uatioQ — Pupila' and tiasa rcpords charted — Meeting the 
situation; Iowa — Individual r*. tlass Deeds — Repeating 
the tests after an interval — Modifying the clasa drill — 
Use of practice tests — Questions and topics for investiga- 
tion — Bibliography. 

CHAPTER III. Reading 

The Problem of Mbasuheuent is Readinq. 
I. Silent Rba nna. 

(a) Recognition of words. 

1. Tiie Thomdifce ^Tsual Vocabulary Scales — Use 

and standards, 
a. The Haggertp visual Vocabulary Tests. 
3. Starch's English Vocabulary Teslj. 
(6) Tests for comprehension and speed. 

1. The Thorndike Scale Alpha. 

2. The Minnesota Scale Beta. 

3. The Courtis English Tests. 

i. Brown's Silent Reading Test— Value of the teat 
in diagnosb. 

5. Starch's Silent Heading Tests. 

6. Gray's Silent Reading Tests. 

7. The Kansas Silent Reading Tests — Score value for 
the Kansas tests — Standard median scores. 

8. The Courtis Sileot Reading Tests. 
n. Oral Reaoinq. 

1, The Jones Visual Vocabulary Tests. 

2, The Ilaggerty Visual Vocabulary Tests. 

3, Gray's Oral Reading Teat — How the tests are 

m. An Ebtiuate op Readinc Tebts. 

Criteria for estimating values — These criteria applied. 

IV. The Service of Readlng Tests. 

Service to the superintendent — Reveals wrong em- 
phasis in teaching — Service to the teacher — Service to 
the child — Remedying the situation revealed — Types 
of situations revealed — A normal situation — To raise 
the median score — Overemphasis on oral reading — 
Care from the beginning — Reading above the primary 
grades — Reading in the upper grades — Where variabil- 
ity is too wide — The difEcult but normal case; sugges- 
tions for helping — Questions and topics for investiga- 
tion — Bibliography. 





CONTENTS 

CHAPTER rv. Spelltoo 

I. The Problem of Measubeubnt w Sprluko. 

Difficulties encouDtered — The foundation words of tie 
English language — Making a speHing test. 
[. Sptnj.iNa Scales. 
1. The Ayres Spelling Scale ^ How constructed — Pupila who 
are not tested — How many words to use ^ Methods of 
giving the teat — Letters per minute — What the Ayrea 
scale really is ^Directions for giving a timed sentence test. 
8. The Buckingham Spelling Scale. 

3. Starch 'a Spelling Scale — Measuring the CKtent oT the abil- 
ity to spell ^Starch's spelling lists. 
ni. Stand ARiis, 

Ayres's scale — The Starch tests. 
IV. How TO Locate Spbluso Dipficultibb. 

Locating bad spellers — Individuality in spelling difficul- 
ties- — ^"Spelling Demons" — Types of misspellings — 
Teaching the pupil to correct his errors in spelling — 
Causes of some misspellings — Good teaching of spelling 
— Devices for improving spelling — Making assodatioiu 
automatic — Courtis's spelling practice tests — Questions 
and topics for investigation — Bibliography. 

"CHAPTER V. Handwriting 1 

I. The Prohlem of Meabukement in Handwritikg. 

Another method of measuring. 

II. HAKDWBlTtNG SCALEB. 

Measuring speed — Selections for the speed tests — 
MeasurioR quality ; use of scales — The score card for de- 
tailed analysis — The scales classified as to use^ — Methods 
of using scales — Measurement tor diagnosis — Use of the 
score card — The Freema n Scale — Using the Freeman 

III. Tbe Reliabiltt of Meabtjtirs and ScAI.Ea. 

RateandquaUty contrasted — Accuracy of the scores — 
Training in using the scales — Relative values of tie de- 
ferent scales. 

IV. SiAVDABD Scores. 

Freeman's proposed standards — What these standards 
rqiresent — Other evidence as to standards — Standards 
required for work. 
V. 1^ Teachivo Situation' Revf.aled. 

Handwriting a complex ability — An individual rather 
than a dass problem — Children diffo' widely in abjlitiea 



I 




CONTENTS 




— Plotting scores, and reading their meaning — 
measurements to re^'eitl progress -^ Meeting 
revealed — Systems of penmansliip 
Movement in handwriting — Rhythm — Speed — Quality 
and speed — General lawa o! learning applied — Devices 
of remedial ingtnjotioo — Increasing speed — Develop- 
ing rliythm — Motivating practice — Reasons tor using 
handwriting scales — Questions and topics for investiga- 
tion — Bibliography. 

CHAPTER VI. Langdage 1 

I J. The Pkoblbu of Mbabijrement m Lanquaob. 
One of measuring specific habits. 
-p. The Measuhement op Abiutt in Enolibh Composition', 
1. The Hillegas Scale. 
2. The Harvard-Newton Composition Scale. 
8. The Breed and Froalie Scale. 
4. Willing'9 Scaie. 
6. The Nassau County Supplement. 
Reliability of measurements — Dse of the scales — Di- 
rections for using the Hillegas Scale — Hillegas Scale 
scores — Directions for using the Harvard- Newton Scale 
— The Harvard-Newton Scale scores — Directions for 
using the Willing Scale ^ The Willing Scale scores — 
The Willing Scale reproduced. 
ni. The Measurbmeht of Lanouagb Abimtt bt Compi^tion 
Tests. 

The Trabue Completioii;Test Language Scales — Tra- 
bue test standards. 
IV. Thm Meahdbbment of Ability in English G&.uimar. 
Types of ability ~ Starch's Grammatical Scales — The 
Punctuation Stale — The Grammar Tests — Bucking- 
ham's Teat. 
V. Meahuhinq Accuracy in Copying. 

EThe Boston test ~ Kinds of errors made — Misspelled 
words — Undotted "i's" jmd uncrossed "t's." 
Educational Siqnificancb op the Use of these 
Scales and Tebtb. 
Fbding specific languaRC weaknesses— Remedying the 
situation rei-ealed - — Analyzing language ability — Ques- 
tions and topics for investigation — Bibliography. 
'ER Vn. HiGii-ScnooL Subjects . . . . S 
Algebha. 
The problem of measure ment — The fundamental opera- 



CONTENTS 

tions of Algebra — Standard Research Tests in Algebra — 
Other Algebra tests — Conclusions from the tests — 
Standards — Meeting the teaching situation revealed by 
Algebra tests. 

n. Gbometby. 

in. FoRBiGN Languages. 

Starch's language tests — The Hanus Latin Tests. 
Henmon's Latin Tests. 

IV. Physics. 

V. Otheb Tests which mat be used in the High School. 
Questions and topics for investigation — Bibliography. 

CHAPTER Vni. Statistical Methods . . . .241 

Good arrangement of scores — The median — Frequency of 
scores — Intervals of distribution — Approximate and true 
median — The average — The mode — Measures of variability: 
(1) Average deviation; (2) Percentiles; quartiles; probable error 

— Correlation — Coefficient of correlation — Graphical repre- 
sentation — Representing three qualities — Representing many 
quaUties — Questions and topics for investigation. 

CHAPTER IX. The Meaning op Scores .... 258 

Indefiniteness of school marks — School marks vs, scores — *"- 
Translating school marks into scores — A satisfactory standard — 
An efficient standard — Effort to be expended on the tool subjects 

— School demands on the tool subjects — Basis for standards of 
accomplishment — Types of standards — Questions and topics 
for investigation. 

CHAPTER X. The Derivation op Tests, and Ex- 
aminations 273 

Analyzing pupils* dass work — Bases for evaluating pupils* 
work — The cycle principle — The per-cent-of-pupils-solving 
basis — A normal distribution of ability — Points to be consid- 
ered in evaluating exercises — Opinion of competent judges — 
The teacher-judgment basis — Reliability important. 

Making examinations more eflfective — Care in framing ques- 
tions — Questions and topics for investigation. 

CHAPTER XI. Use of Standard Tests in the Super- 
vision OF Instruction 284 

Assisting teachers — Four steps in supervising instruction with 
tests — Giving the tests — Tabulating the scores — Interpreta- 
tion of the scores — Remedial treatment — ^Teachers need detailed 
and definite specifications — Courses of study represent working 

xix 



CONTENTS 

spedficationr— Such subject-matter directiom not quantitative 
— Such directions lead to formal and uniform instruction — The 
Tests aim to introduce quantitative work — Tests introduce scien- 
tific management — Handwriting an example of wasting time — 
The Cleveland Reading results a study in efficiency — Standards 
for instruction illustrated from Arithmetic — Results of using 
the Courtis tests in Boston- — The supervisor and the standard 
tests — Questions and topics for investigation. 

INDEX 803 



i 



} 

1 



! 



LIST OF CHARTS AND FIGURES 

IN THE TEXT 

1. Distribution of marks assigned to one Geometry paper by 

116 teachers 7 

2. Distribution of scores with the Stone Reasoning Test in 

Butte 43 

3. A chart, showing the scores made by a sixth-grade pupil, 

in comparison with standard scores, using the Courtis 
Arithmetic Tests 51 

4. Median scores for all Cleveland schoob, and five selected 

schools (Test D, division) 52 

5. The Ayres Spelling Scale 113 

• 6. Showing distribution of 91 pupils, according to the num- 
ber of words spelled correctly 117 

7. A Section of the Thomdike Handwriting Scale . . . 149 

8. Two sections of the Ayres Handwriting Scale . . 150-151 

9. Standard score card for measuring handwriting . . .155 

10. Individual record card. Freeman Scale .... 160 

11. Showing the distribution of scores in handwriting of a 

third-grade class 177 

12. Showing the distribution of scores in handwriting of a 

fourth-grade class 177 

13. Showing the distribution of scores in handwriting of a 

fifth-grade class 177 

14. Results of the Composition Test in Salt Lake City . .201 

15. Showing correlation of grades at entrance to college and 

in the Freshman year 253 

16. Graphic representation of the standard scores for Starch's 

Spelling Tests 254 

17. Another form of graphic representation of the same stand- 

ard scores as in Figure 16 254 

18. Representing graphically the standard for handwriting 255 

19. Representing a pupil's score in several subjects . . . 256 

20. Showing the meaning of school grades in terms of scores 261 

xxi 



LIST OF CHARTS AND FIGURES 

21. Showing a normal distribution of pupils according to abil- 

ity .. . . . 276 

22. Distribution in rank of 47 cities, arranged in classes ac- 

cording to time spent on handwriting .... 295 

23. Average scores in speed and quality of silent reading in 

each grade in Cleveland, and in 13 other cities . . 297 

24. Same, in three selected Cleveland schools .... 298 



* • 



, * •«' * 



» - o ' 



EDUCATIONAL TESt^' AND 
MEASUREMENTS 



t i, « J 



CHAPTER I 

THE INACCURACY OF PRESENT SCHOOL MARKS 

School marks. Educational measurements are not new in 
school work* Since schools have existed, teachers and other 
school officials have atrtempted to measure the abilities of 
pupils by estimating daily recitations and by examinations. 
The measures of the abilities of pupils obtained in these ways 
are thought to possess a high degree of precision and are 
treated very seriously. 

The promotion of pupils depends upon the "grades " they 
receive. The ability of a pupil in each of the subjects is meas- 
ured by the teacher's estimate and by examination, and, if 
the resulting measures show the pupil to be a few points, or 
in some instances a fraction of a point below the "passing 
mark," the pupil is classified as a failure. If the resulting 
measures equal or are above the "passing mark," the pupil 
is promoted. 

The "grades" or school marks are entered upon the 
monthly or quarterly report cards. Parents, as well as teach- 
ers and pupils, take these school marks very seriously. If 
Johnnie's "grades" for a given month are below those of the 
preceding months, or, worse still, if they are below those 
of neighbor Smith's Mary, an explanation is demanded. A 



r 



e EDUCATIONAL TESTS AND^tolASUREMENTS 

permanent record is kept of;Ut^gl^s^th*e yearly " grades," and 
the awarding of school Wbnbfs'is based upon it. 

Until jeceriflj'j'Vfa'Ctically all admission to college was 
detpci^indd by 'examination. Except in the universities and 
, e^a^^ of the Central and Western American States the 
■ 'tlistom still maintains generally throughout the world. This 
practice is based on the assum,ption that the examining com- 
mittee can determine thereby the effectiveness of the candi- 
date's college preparatory work. The civil service, from its 
inception in China centuries ago until the present day, has 
employed the examination &s a means for measuring the 
ability of persons who desire positions operated under this 
system. 

The inaccuracy of teachers' marks. Within the kst few 
years a number of investigations have been made to ascer- 
tain the accuracy or reliability of measures obtained by means 
of teachers' estimates and by means of examinations. In the 
world of physical things we measure distance by means of 
the yardstick, mass by means of scales, the volume of liquids 
by means of gallon measures. Measures of these magnitudes, 
when made carefully with accurate instruments, possess a 
high degree of reliability. By a high degree of reliability we 
mean, for example, that if two persons measure the length 
of the same room by means of the same yardstick or any 
other yardstick, the two measurements will be approximately 
equal. If tliey differ by more than one or two inches, we doubt 
theaccuracyof both, and we demand that the room be meas- 
ured again. Similarly, in the case of school-children, if we 
find that when the same children are measured in the same 
subjects by two different teachers, the two sets of measures 



INACCUHACY OF SCHOOL MARKS 3 

do not agree rather closely, we have reason to doubt the 
accuracy of both sets of measures. On the other hand, if 
the two sets of measures ("grades") agree closely, we have 
reason to believe them accurate or reliable. 

Carter's investigation. In 1011, Carter ' investigated the 
school marks which had been given to the pupils who com- 
pleted the eighth grade in three elementary schools in the 
city of Milwaukee, Wisconsin, and who entered the same 
high school. It was found in the case of arithmetic that 
two thirds of the marks given in school B were below 78; 
in school A, one third were below 79; in school C, ons tiiird 
were below 82. Taking the higher marks, in school B, one 
third were above 78; in school A, two thirds were above 79, 
and one third above 84; in school C, two thirds were above 
88 and one third above 88. According to these marks it is 
evident that the pupils in school C received much higher 
marks than did the pupils in the other two schools, and 
that the pupils of school B were judged to be conspicuously 
inferior to the pupils in the otter schools. If these marks 
represent accurate measures of the ability of these pupils in 
the field of arithmetic, we would expect the pupils in school 
C to receive the highest marks in mathematics when the 
pupils from the three schools went to the same high school. 

Carter sums up his conclusions in these words: "When 
the rank of the pupils in arithmetic was compared with their 
rank in algebra, it was found that a greater percentage of 
school B (the school which gave the lowest marks) excelled 
in maintaining their original rajik or increasing it. In fact, 

• Carter, R.E., "Correlation of Elementary Schools and Higb Schools"; 
in Elimentary School Ttacher, vol. 18, pp. 109-lB. 



4 EDUCATIONAL TESTS AND MEASUREMENTS 

there was a complete reversal of things from what the abso- 
lute marks might indicate." Thus, we find that the two sets 
of measurements of these pupils are characterized by differ- 
ences rather tlian agreement. We, therefore, have the evi- 
dence of this investigation that the marks assigned by the 
teachers were inaccurate measures of the abilities of the 
pupils. 

Kelly's investigation. In 1913, Kelly ^ made a similar 
investigation of the marks given to the sixth-grade pupils 
in four ward schools in Hackensack, New Jersey, and the 
marks given to the same pupils when they went to a com- 
mon departmental school for seventh -grade work. He states 
his conclusions as follows: "This means that for work which 
the teacher in school C (one of the ward schools) would give 
a mark of 'G' (good) in language, penmanship, or history, 
the teacher in school D (another ward school) would give less 
than a mark of *F' (fair)." 

Conclusions from these studies. From these two inves- 
tigations (and many others have been made which might 
be quoted °) it is clear that when different teachers meas- 
ure the abilities of the same pupils in the same subjects 
by means of examinations and estimates of recitations, 

(they give different "grades." Hence, we must conclude 
that teachers' marks are unreliable, that is, they are in gen- 
eral inaccurate measures of the abilities of pupils, 

Johnson's investigation. Another type of investigation 
has been made by Johnson,^ Principal of the University 

' Kelly, F. J. Teachers' Marks. (Teachera College Contributions to 
Education, no. 00, p. 7.) 
' Sec Kelly, F. J, Teackeri' Marks, pp. 6-50. 
* Johnson, F. W. "A Study of High School Grades"; in Scliool RBnieie, 



M 



INACCURACY OF SCHOOL MARKS 5 . 

High School of the University of Chicago. In the Univer- 
sity High School, "F" deiiotes failure, and the four succes- 
sive ranks above failure are indicated by "D," "C," *'B," 
and "A." For the several departments of the schjool, John- 
son tabulated the number of times each mark was given 
during the years 1907-08 and 1908-09. The facts revealed 
by these tabulations may be illustrated by the following. 
In EngUsh the per cent of failures was 15.5, which was 
nearly double that of history (8.1). The highest' mark 
("A") was given to 9.3 per cent of the pupils taking 
French, but in the German department it was awarded to 
17.1 per cent. Enghsh and history occupied similar places 
in the program of studies. They were taken by practically 
all students. French and German likewise occupied similar 
places in the school. Thus, there is no apparent reason 
why these diflferences in marking pupils should exist. This\ 
lack of uniformity indicates the lack of imiform standards 
in marking pupils which, of course, means that the mark-/ 
ing is probably inaccurate. 

Marking examination papers. The written examination 
is the most common means of measuring the abilities of 
pupils, although many teachers and school patrons oppose 
its use. They contend that pupils working under pressure i 

1 

frequently become nervous and confused and consequently I 
cannot do themselves justice, while other pupils, who have 



no real ^asp of the subject, are able by cramming to write 
excellent papers. It is also contended that the questions 
are frequently not well selected and do not pertain to the 
essentials of the subject. 

voL 19, pp. lS-24. See also Kelly, F. J., Teachers' Marks, p. 11, and fol- 
lowing for reports of similar investigations. 



> 



6 EDUCATION.U, TESTS ANT) MEASCRKMENTS 

There is probably some Imtii in the above assertions. 
but within the past few years there have been a number of 
urveatigationa to ascertain if teachers mark examination 
papers accaratelT', assuming that what a^qiears on the 
papers is a trae record of the abilities of the popils. Stardh 
and Elliott ' investigated the accm^cy with which teachers 
marked papers in Eo^ish, geometry, and history. Their 
method and the facta revealed may be illustrated by tbe 
case of geometry. 

A facsimile reproduction was made of an acttial examina- 
tion paper in plane geometry. A copy of thb reprodaction 
was sent to each of the high schools included in the North 
Central Association of Colleges and Secondary Schools, 
with the request that it be marked on the scale of one hon- 
dred per cent by the teacher of geometry. Papers were 
returned from 116 schools, and the results tabulated. 
When we consider that the subject-matter of geometry is 
quite definite, and that the papers were marked by teachers 
who were thoroughly acquainted with the subject, it would 
seem that we might expect the marks or "grades" placed 
upon this examination paper to be in close agreement. How- 
ever, exactly the opposite was the case. 

Distributioa of mai^s. The distribution of the marks 
is shown in Fig. 1. Of the 116 marks, two were abo^-e 90, 
while one was below 30, Twenty were SO or above, while 
twenty other marks were below 60. Forty-seven teachers 

' starch anil Elliott. "RdUbaily of Grading High-School Work in 
Ensliih": in School Rrriftr. vol. eo.pp. 44S-57: " Rellahility of Grading 
Work in Mathematics"; in HrJioal Rerieie, vol. 81, pp. «5t-S9: "Reliabil- 
ily of CrodiDg Work in History"; in ScAooi Rerinc, vol.il, pp. 676- 




INACCURACY OF SCHOOL MARKS 



assigned a mark ptasisig or above, while 
the paper not worthy of a passing mark 



• •••• •••• 

• ••• ••••••• •••• •••• • » 

• • • •••••••••••• ••••••••• •••• 



28 



68 66 



80 



86 



70 



76 



80 



85 



90 



Fig. 1. DiBTBiBiTnoN of Masks Assigned to one Gbometst Papeb 

By 116 Tsachebs 

Faadng grade 75. Range 28 to 92. Marks assigned by schools whose passing grade 
was TD weie wei^ited bgr 8 points. Median 70. Probable error 7JS 

Not only were similar results obtained by Starch and 
Elliott in English and in history, but other investigators ^ 
have verified them many times. In the face of such facts 
only one conclusion is possible; namely, that under ordi- 
nary conditions the marks assigned to examination papers 
by teachers are very unreliable. Such marks can represent 
only very crude and very inaccurate measures of the abili- 
ties of pupils, (it is not too much to say that the mark 
which a pupil receives on an examination paper depends 
upon the teacher who grades the paper, as well as upon 
what the pupil places upon the paper. 

It has also been shown that the same teacher is not con- 
sistent in his own marking. If a set of papers are marked a 
second time, the two sets of marks will vary widely.' 

^ See Starch, Daniel. Educational Measurements, p. 8, for the factors 
which are responsible for this inaccuracy. 

' See Kelly, F. J. Teachers' Marks, p. 51, and following, for accounts 
<^ other investigations. 

' See Starch, Daniel. Educational Measurements, p. 0. 



EDUCATIONAL TESTS AND MEASUREMENTS 



r 

B \ Error due to unequal value of questions. A critical study 
H of examinations and of the manner of giving them reveals 

H other causes of inaccuracy in teachers' marks. In the first 

H place, tlie questions are generally considered equal in value, 

H and a pupil is given as much credit for answering a very easy 

I question as for answering a difficult one. Or, if the questions 

I are not considered equal in value, values are arbitrarily 

H assigned by the teacher upon the basis of her estimate of 

H the importance of the questions rather than upon the basis 

V of their difficulty for pupils. In the case of the most fun- 

damental facts, the importance may be used as the criterion 
of value, but in general the difficulty of the questions and 
the time required for answering them are used as the criteria 
of value. The difficulty of a question is represented by the 
per cent of correct responses. 

Investigation has shown that teachers' estimates of the 
values of examination questions vary as widely as do the 
marks which they assign to examination papers. Inglis ' 
sent a set of ten questions in plane geometry to about 
three hundred teachers of mathematics in the high schools 
of the Middle States and New England, with the request 
that each teacher assign to each question the number of 
points which should be allowed for a correct answer. The 
only limitation placed upon the teachers was that the 
total number of points should equal 100. Out of 123 re- 
plies which were received, 20 were to the effect that a value 
of teu points should Ije assigned to each question. The 
variations of the remaining 102 judgments are shonii in 
Table I. 

' Inglis, AlFnander, "Variability of Judgments in EquBlizmg Values in 
Grading"; in Educatioaai Adminittralion and Supvrrinon, vol. i, pp. ii-S). 



INACCURACY OF SCHOOL MASXS 9 

It is interesting to note that the values 5,-8, 10, 12, and 
15 were used most frequently. This was probahly because 
they are convenient values and because the teachers were 
accustomed to use them and not because of any sdentific 
determination of the value of the questions. The range of 
values is surprisingly large. That teachers should differ so 
widely in their judgments is significant. It emphasizes the 
chaotic condition which now exists and will continue to 
exist until we have standardized tests. 



Table I. The Vahiabiutt or Teachers' EsnMATBS 
Value of Examination Questions 


of the 






(pff«j) 


' 


' 


3 


* 


5 


6 






9 


M 


Talai 


Per 

eaU 


a:::::::::::::::: 


J 




"2 


3 


1 


i 
J 

1 

21 

I 

'i 

■• 


1 
1 


I 

i 

21 


i 

i 


13 

i 

i 


S 
11 


\ 




4 
















r-:::;::::::: 












JJ-6 




12-0 












16 






1 


l.B 


IS 


If 


1-5 


20 


V. 




IS 

il 


18 

2.e 


h 




Med.D.T:{i£)..., 







de*»d in Clup**' VIU. 



10 EDUCATIONAL TESTS AND MEASUHEMENTS 

When the questions were given to a class of about forty 
bigh-school students, their answers to the questions indi- 
cate the following relative difficulties of the questions: — 
Question number 1 2 3 i 56789 10 

Total credits. . . 187 191 135 lOS 260 8 20 15 293 

In scoring the papers credit was given for answers partly 
right. Since question 10 has the largest number of credits, 
it is shown to be the least difficult, and therefore to have 
the least value. Question 9 has the greatest value. The 
other questions rank between these two in value. ^ 

Another example of unequal values. The difference in the 
values of questions is illustrated by the results of giving the 
following examination in arithmetic to a sixth-grade class. 
Out of a class of 31 pupils, 17 answered the first question 
correctly, 39 the second, 12 the third, and 20 the fourth. 
It is very evident that for the pupils of this particular class 
the second question was the easiest of the four. If the 
second question is considered as having a value of 10 points, 
certainly the other three questions should have higher values. 

1. Write in Roman system: 49, 79. 94, 96, 146. 

8. It 11 A. of land are worth S1485, what is one acre worth? 

3. It a desk is 4| ft. long and 3,'^ ft- wide, what is the perimeter? 

4. How much must you add to 3Gj in. to make a, yard? 

5. A man has to travel 117 mi. After going f of the distance, 
how many miles has he still to travel? 

6. The perimeter of a square is 851 in. What is the length of 

7. Ot 152 chickens a hawk captured 13^%, How many were 
captured? How many were left? 

8. A man saves ?e75.S0 a yr., which is 38% of his income. How 
much ia his income? 

' See page 115 for similar results on the reliatiility of teachers' judgments 
ot the relative difficulty of the spelJing of words. 



INACCURACY OF SCHOOL M.\EKS 11 

B. At $1.38 a yd., what will 37 yds, of carpet cost? 
10. At ¥65.50 an acre, what must a man pay for 25.4 acres of 
land? 

It is easy to understand how a serious element of error is 
introduced when each question is considered to Jiave a value 
of 10 points and the questions are not equal in difficulty. 
The situation is much the same as we would have in meas- 
uring distances if yardsticks of different lengths were 
used, but were considered to be equal. Under such circum- 
stances, a yard would have no definite length, and to say that 
a certain distance was 21.42 yards would convey no definite 
information about it. For this reason the Federal Govern- 
ment has standardized all weights and measures by estab- 
lishing definite units, and before we can obtain definite 
measures of the abilities of children, it will be necessary to 
devise tests consisting of standard units. 

Rale of doing work neglected. In the second place, it is 
customary in giving an examination to allow sufficient time 
for all pupils to answer all of the questions, or if this is not 
done, the papers are graded on the basis of what each pupil 
has done. This manner of giving an examination fails to 
take into account the rate at which a pupil is able to answer 
the questions. Only the quality of the answers is consid- 
ered, and the pupil who answers the questions with diffi- 
culty and who barely finishes in the time allowed, receives 
exactly the same "grade" as the more capable pupil who is 
able to answer the questions easily and who finishes in one 
half or one third of the time, providing the two sets of an- 
swers are equivalent. It is clear that when this is done, the 
"grade" or mark which the pupil receives is not a true 



1« EDUCATIONAL TESTS AND MEASUREMENTS 

measure of his ability, because the rate at which he is able 
ta do work is just as much a factor of bis ability as is the 
quality of what he does. 

Some may insist that it is unfair to the alow-working 
pupil not to allow sufficient time for him to answer all of 
the questions. However this may be, it certainly is unjust 
to the more capable pupil to deprive him of the opportunity 
to demonstrate what he is able to do. This is exactly the 
case when the work asked of him is sufficient to keep him em- 
ployed only a half or a third of the period allowed for the 
examination. This practice of ignoring the rate of working 
probably tends to cause desultory and careless school work. 

Investigation has shown that rapid work and a high de- 
gree of quality or accuracy are not incompatible in arith- 
metic. The same statement could probably be made with 
reference to reading. Investigation has indicated that a 
considerable per cent of pupils can be made more accurate 
in arithmetic by forcing them to work more rapidly. It has 
also been shown that about three pupils out of four make 
progress in speed and accuracy at the same time. In view of 
J facts, it appears that j;ood instruction requires that 

l_teacher give attention to the rate of doing work as well 
othe quality of the work done. The rate at which a pupil 
18 able to do work of a given quality is as much a factor of his 
' ability as is the quality of the work which he does. 

The rate at which a pupil works can be measured very 
easily. It is simply necessary to secure a record of the time 
which he spends in answering the set of questions. When an 
examination is given to a group, it is rather inconvenient 
to secure a record of the time which each pupil spends upon 



INACCURACY OF SCHOOL MARKS IS 

the examination. However, one can secure just as true a 
record of the rate at which each pupil works by making 
the examination long enough so that no pupil finishes in the 
time allowed. For each pupil the number of minutes, di- 
vided by the number of units of work which he did, will give 
his rate of working per unit. 

Wide range of topics included within an examination. 
^ ^ In the third place, examinations are usually made up of 
i\[ _ questions froni a number of different fields within a subject. 
Take, for example, the questions given on page 10, Ques- 
tion 1 calls for a knowledge of Roman numerals; question 8 
asks the pupil to find the cost of a unit when the cost of the 
whole is given; questions 3, 4, and 6 deal with mensuration; 
question 5 calls for the finding of a fractional part of the 
whole; questions 7 and 8 are problems in buying. Thus we 
find six different topics included within an examination of 
ten questions. 

Suppose a pupil receives a " grade " of 80 on this examina- 
tion. Even if 80 is an accurate measure of what the pupil is 
able to do on this examination, it cannot have a definite 
meaning. It does not tell us whether the pupil lacks ability 
in the field of Roman numerals, or in the field of percentage, 
or in some other of the fields included in this examination. 
In order that the total score made on an examination may be 
a definite measure of a pupil's ability, the questions which 
compose it must be drawn from a single field, or at most 
from a small group of closely related fields. If this is not 
done, the scores for each question must be kept separate. 

The situation is much the same as if the length, width, 
h^ght, seating capacity, number of windows, and the num- 



14 EDUCATIONAL TESTS AND MEASUREMENTS 

ber of doors of a room were added together to form a meas- 
ure of the room. If we assume that each of these character- 
istics of the room was measured with a high degree of ac- 
curacy, the total of the numbers expressing the measures 
gives us only very general information about the room. If 
the total is large, we know that the room is probably large; 
if the total is small, we know that it is small. But under no 
circumstances can we be certain that the room has any 
windows or doors, that it contains any seats, or that its 
dimensions are well proportioned. In order- that we may 
have definite information about the room, it is necessary 
that the measures of the several characteristics be kept 
separate. 

It is obvious that the questions of an examination should 
pertain to the significant topics of a subject if the examina- 
tion is to furnish valuable information. In the case of meas- 
uring a room, a number of cbaracteri sties of the room might 
be measured. For example, one might measure the diago- 
nals of the floor and walls, the height of the chairs, the color 
of the walls, the quality of the finish, etc. Such measures 
would be important for certain purposes, but if our purpose 
is to learn about the size of the room they are not very sig- 
nificant, and, hence, are not valuable. For this purpose the 
valuable measures are the width, length, and height of 
the room. If we have another purpose, for example, to 
determine the quality of the lighting of the room, other 
measures are the valuable ones. 

Most valuable topics for education. As yet only a few 
studies have been made to determine the topics writhin our 
school subjects which are most valuable for the purpose of 



INACCURACY OF SCHOOL MARKS 15 

education. Ayers ^ has determined the 1000 words which are 
used most frequently in writing and, hence, whose spelling is 
the most valuable.^ Charters ' has determined the rules of 
grammar which the children of Kansas City, Missouri, need 
to learn in order to correct the errors in their language, both 
oral and written. Freeman ' has analyzed handwriting into 
its significant factors.^ A systematic attempt is being made 
by a Committee of the Department of Superintendence of the 
National Education Association to determine the minimum 
essentials of the common branches. The reports of this com- 
mittee have appeared as the Fourteenth Yearbook,® part i, 
and the Sixteenth Yearbook,^ part i, of the National Society 
for the Study of Education. 

^ Ayers, L. P. Meamremeni of AbUiiy in Spelling. (Russell Sage 
Foundation Bulletin.) 

* See chapter IV. 

* Charters, W. W. and Miller, Edith. A Course of Study in Orammar 
based upon the Grammatical Errors of School Children of Kansas CUy, 
Missouri. (University of Missouri Bulletin, vol. 16, no. 2.) 

* Freeman, F. N. The Teaching of Handwriting. (Houghton Mifflin 
Company, Boston, 1914.) 

* See chapter v. 

* Minimum Essentials in Elementary-School Svhjects — Standards and 
Current Practices. (152 pp. 1915.) 

' Second Report of the Committee on Minimal Essentials in Elemeniary' 
School Subjects. (192 pp. 1917.) 



16 EDUCATIONAL TESTS AND MEASUREMENTS 



QUESTIONS AND TOPICS FOR INVESTIGATION 

1. Wliat should be the puipose of examinationa? Does the fact that 
examination papers are not marked accurately mean thai examioa- 
tions should not be ^veuF 

8. In view of the proven ioaccuracy of teachers' marks, is a differenee of 
leas thaD five per cent between "grades" significant? 

3. In life outside of school is Ibe rate at wbicb one is able to work 
considered ? 

4. Wbicb gives the truer measure of the ability of all pupils', (a) an 
examination of definite length for whieh unlimited time is allowed, 
or {b) an examination tor wbich tbc time is limited so that no one 
finishes? 

5. Should the fact that a pupil fails to complete an examination in the 
time allowed be considered a reason for raising or lowering the mark 
given to that pupil's paper? 

0. Have a set of examioation papers marked ind^endent]y by several 
teachers and compare the seta of marks. If possible secure from the 
teachers reasons for their differencea cjf opinion. 

7. Make a study of the distribution of marks given to the same pupils 
by bigh-scfaool teachers. 

8. Secure a set of examination papers written by pupils whose hand- 
writing you do not know. Mark the papers recording your mark on 
B E^iarate sheet. Do this several times at intervals of a week or ten 
days. Then compare the several sets of marks. 

9. Why should "catch" questions and irrelevant questions be con* 
demned? 




t 



CHAPTER n 



AHITHMETIC 



I. The Fbobleu of Mbasubing Arithmetical 
Abilities 

In measuring a physical object, such as a room, chair, 
haystack, or irregular-shaped field, it is necessary first to 
determine what dimensioDS are to be measured. For ex- 
ample, if our purpose is to ascertain the number of yards of 
carpet needed to cover a floor, only certain characterbtica 
of the room, namely, length and width, are significant. On 
the other hand, if our purpose is to obtain a numerical meas- 
ure of the lighting of the room, other characteristics, such 
as the number, position, and area of the windows, are ea- 
aential. 

It is obvious that, before our efforts to make educa- 
tional measurements can be intelligently directed, we must 
know what we are to measure, and what the significant 
features are. Thus, the first step is to analyze the out- 
comes of instruction and to determine the significant char- 
acteristics of the elemental abilities. 

Numerous arithmetical outcomes have been recognized 
in the statements of the aim of teaching arithmetic, but it 
is generally agreed that among the desired outcomes of 
arithmetical instruction are the abilities required to perform 
the operations of addition, subtraction, multiphcation, and 
division with integers and with fractious, both common 
and decimal. 



18 EDUCATIONAL TESTS AND MEASUREMENTS 

Arithmetical abilities automatic or habits. The pupil must 
be able to perform the operations of arithmetic rapidly and 
with a minimum of attention if he is efficient. As soon as 
he recognizes that a multiplication combination is called for, 
as, for example, 8X7, the response, 56, must be forthcom- 
ing immediately. Time cannot be taken to think out the 
product. The pupil's attention must be reserved for decid- 
ing what operations to perform in dealing with the problems 
of arithmetic. The situation is similar to that which we 
have in any field of action where particular acts occur fre- 
quently and are always the same. Such acts must be reduced 
to the plane of habit if a person becomes skillful. We may 
therefore describe these arithmetical abilities as being 
automatic or habits. 

Arithmetical abilities distinct. A few years ago Stone ' 
investigated the nature of ability in arithmetic and con- 
cluded that it was made up of a number of specific abilities. 
His conclusions have been corroborated by a number of 
other investigations,^ and it is now reasonably certain that 
in teaching the operations of arithmetic, we are attempting 
to engender a number of specific abihties which are rela- 
tively distinct, and not a single arithmetical ability. There 
are as many different abilities as there are types of examples. 
In fact, it is obvious that the ability to add a column of three 

■ Stone, C. W. Arilkmelical AbiUiiei arui Some Faetors Determining 
Tkem. (Teachers College Contributions to Education, no. 19. 1B08.) 

' Ballon, F. W. DetermintTig Vie Achievement of PupiU in Addition of 
Fradiom. (School Document no, 3, 1816, Boston Public Sdiools.) 

Recently an investigation was made, under the direction of the writer, 
of the nature of the ability to place the decimal point in a quotient. This 
iDTeatigation showed that a number of ipeci&c abilities were involved, and 
not a single ability. 



ARITHMETIC 

figures is not the same as the ability to add a column of 
twelve figures. In adding a column of figures it is necessaiy 
that one hold in mind the partial sum until he has added 
tUe next figure. This process must be repeated continu- 
ously until the final sum is reached, and a failure to do this 
continuously will result in stopping the adding, at least 
temporarily. It is a frequent occurrence, for one who is not 
accustomed to adding long columns of figures, to find that 
he has stopped, perhaps has even lost the partial sum, and 
must begin again. The span of attention required in adding 
three figures is short, and pupib who are able to do examples 
of this type with a high degree of skill frequently are unable 
to add long columns of figures with an equal degree of skill. 
In fact, we have no reason to expect them to be able to do 
this type of example until they have practiced upon it. 

Separate types in handling integers. Courtis,' the author 
of the Standard Research Tests, has identified the following 
types of examples in the operations with integers: — 

Addition: (I) addition combinations; (3) single-column 
addition of three figures each ; (3) " bridging the tens," as 
38 + 7; (4) column addition, seven figures; (5) carrying; (6) 
column addition with increased attention span, thirteen 
figures to the column; (7) addition of numbers of different 
lengths. 

Subtraction; (1) subtraction combinations; (2) subtrac- 
tion of 9 or less from a number of two digits, both with 
and without simple "borrowing"; (3) subtraction involving 
borrowing. 



20 EDUCATIONAL TESTS AND MEASUREMENTS 

Multiplicaiwn: (1) multiplication combinations; (8) mul- 
tiplicand two digits, multiplier one digit, and no carrying; 

(3) same as number 2, but with carrying; (4) long multipli- 
cation, without carrying; (5) zero difficulties; (6) long mul- 
tiplication, with carrying. 

Diviaion: (!) division combinations; (2) simple division, 
no carrying; (3) same as number 2, but with carrying; 

(4) long division, no carrying; (5) zero difficulties, without 
carrying; (6) long division, with carrying, "first ease," the 
first figure of the divisor is the trial divisor and the trial 
quotient is the true quotient; (7) "second case, where the 
trial divisor is one larger than the first figure of the dividend, 
but the trial quotient is the true quotient"; (8) " third case, 
where the first figure of the divisor is the trial divisor, but 
the true quotient is one smaller than the trial quotient"; 
(9) " foiu-th case, where the first figure of the divisor must 
be increased by one to get the true quotient." 

Each a specific habit. Each of these types of examples 
requires a specffic habit or automatism. To be sure, certain 
elements, such as the fundamental combinations, are com- 
mon elements, but careful analysis will show that the abil- 
ity to do examples of one type is different from that re- 
quired to do another. Not only will a careful analysis reveal 
this fact, but it has been repeatedly demonstrated by care- 
fully conducted investigations. In addition to the specific 
automatisms which are required for the four fundamental 
operations with integers, a number of other automatisms 
are required for the operations with fractions both common 
and decimal. At present we have only partial analysis of 
the examples in these fields, and for that reason it is not 



ARITHMETIC 21 

possible to state what are the types of examples that are 
within the range of school work. 

These abihties are specific habita or automatic responses. 
Their significant characteristics are the rate or speed of 
performance and the accuracy of the response. Thus, the 
measurement of arithmetical abilities in these fields involves 
determining only at what rate a pupil is able to do examples 
of the elemental types, and how accurate his answers are. 
This is accomphshed by having him do examples of a given 
type for a specified time. From his test paper his speed and 
p)er cent of examples correct may be determined. These two 
quantities represent the measure of his ability to do this 
type of example.^ 

A complete and detailed measurement would require 
that a test be provided for each type of example, but for- 
tunately certain combinations can be made. An example 
in addition consisting of three columns of nine figures each 
includes the addition combinations, simple column addi- 
tion, and carrying. Thus, if a pupil responds satisfactorily 
to examples of this type, we know that he possesses the abil- 
ity to do the types of addition examples involved therein. 
On the other hand, if his response to this type of example 
is unsatisfactory, we do not know just what elemental abil- 
ity he lacks. The use of a single test of this type to measure 

' Strictly speaking the number of essmples done and the per cent of 
examples correct is a measure of the pupil's perfonnance rather than of his 
ability. A pupil's performaiice is affected by many factors audi as his emo- 
tional status, physic^ condition, light, temperature, and the like. Or, it may 
be that a pupil does not try to do his best on a given test. A pupil's ability 
can only be inferred from his performance, but when conditions are properly 
controlled, such inference is reliable in all eicept a few cases. In order to 
avoid an awkward form of statement and bel^ause the practice is general, 
we shall speak of a score as a measure of a pupil's ability. 



23 EDUCATIONAL TESTS AND MEASUREMENTS 

a group of arithmetical abilities has this very obvious limita- 
tion in diagnosing the conditions which exist, but it does 
provide a very satisfactory general survey. 

Why we need to use arithmetical tests. The fundamental 
reason for measurement is to secure information which will 
be helpful in making instruction more effective. > A general 
survey furnishes general information. Such information 
is useful in determining the general effectiveness of the 
instruction. The teacher, however, is primarily concerned 
with details of instruction and with individual pupils, and 
therefore must have detailed information in order to know 
how to adjust the instruction to the needs of the individual 
pupils. She needs to learn what types of examples her pupils 
can do with a satisfactory degree of facility, and what types 
they cannot do. She needs to learn what pupils possess 
standard ability and what pupils do not. A general test 
serves to locate the pupils who are not yet up to standard, 
but a more elaborate test must be used to reveal the exact 
nature of the shortcomings of the pupils. 

Besides the abilities involved in performing the opera- 
tions of arithmetic, there is another large group of abilities 
that fimction in determining what operations to perform in 
solving problems.^ The analysis of this division of arithmeti- 
cal abilities has not been carried as far as in the case of the 
operations of arithmetic. However, it appears that these 
abilities involve knowledge rather than specific habits. 

' The word "problem" is used by sorae writers to designate both "ex- 
amples" and "problems." In this book the word "example" will be used 
to designate exercises which explicitly cull for certain arithmetical opcra- 
tionB. The word "problem" will dtsignate only those exercises which re- 
quire the pupil to determine first what uperaUons are to be performed. 



ABITHMETIC 



n. Standardized Tests for Measuring Arithuetical 
AsiLi-n&s 
1. The Courtis Standard Jtesearck Tests, Series B 
The Standard Research Testa, Series B, or, aa they are 
commonly called, the Courtis Arithmetic Tests, have prob- 
ably been more widely used than any other instrument for 
measuring arithmetical abilities, and as a result we have 
better comparative standards for their use. The series 
consists of four tests, printed on four consecutive pages. 
They are suitable for a general survey of the abihties of 
pupils to perform the operations with integers. ] 

Teit No. 1. Addition 
The twenty-four examples of this test have been con- 
structed so that all have the same form, three columns of 
nine figures each. The following are samples of the ex- 
amples. Time allowed, 8 minutes. 

927 207 136 486 884 176 

379 925 340 763 477 783 



4 194 439 5«7 733 

Test No. 2. SubtTaetiott 
This test consists of twenty-four examples, each involving 1 
the same number of subtractions. The following are samples. , 
|l Time allowed, 4 minutes. 

107795491 75088824 91S00053 87930983 
77197029 57406394 



I 



U EDUCATIONAL TICSTB AND MEASUREMENTS 
Tent No. 3. MultijAieation 
TbU l«8t conHiMtH of Iwcnty-four examples of this type. 
Time allowed ti minutes. 

SfMtl Sd07 £731) tiG4S 0537 

Hfl 7S 85 46 98 



Tent No. i. Division 
Tliia teat coniiists of twenty-four examples of this type. 
Tli»o allowed, fl minutes. 
«5)fl773 M)8fl8« S7)flOW 



7a )M?ea w iauna sshsmo saniass 

In giving the lest tlie pxipils arc directed as follows: — 

Yiwi will tn» g\\vn (light wimitps to find the answers to as many 
«t these ttildiliuii exttiHiiles as tHWsiblf. Write the answers oa (his 
)ka|>er ihrtn-lly tiiiileruetitK the exsniples. You are not expected 
to be aMe to itu tlieiu all. Ytm will be marked for both speed and 
tMvurao'. but it is miire important to have your auswers right thao 
to try a Kit*' mnuy exam))tes. 

Marbiog the papers. Iii mtu-king the test papers, which 
i» tknw bj' the wse ikf a jwiuted aiiswer card wluch is run along 
across the i>a^. no credit is pven for examples partly right 
no* for examples (,>artl)' completed. A pupil's score is the 
uuuttier of examples attemptevl and the number ri^ht. This 
siutpk pJan of maxkiu^ th? paptrrs iiisures imiformity and 

ftCCUHK'y. 

Each of the examples of a te^t calls for the siuueiiunib«rc^ 
operations under approxiuiately the same conditions. This 
mak.eit the examples of each te«C approximately equal in 
itil&,'ult^ . -Vuj exauupit; of the addition test, suy the seventh. 



ARITHMETIC 



iS 



is just as difficult as any other, say the second. Thus, the 
tests consist of twenty-four equal units, just as a yardstick 
consists of thirty-six equal units (inches). The measure of 
a pupil's ability is represented by the distance he advances 
along the scale in the given time, i.e., by the number of ex- 
amples done and by the per cent of these examples which 
have been done correctly. 

Since an example of one of these tests is defined as so many 
oi>erations under certain conditions, it is possible to con- 
struct other tests equal in difficulty. Four forms have been 
constructed. This makes it possible to use a different form 
when the teats are given a secoTid time. 

2. The Clevelajid-Surrey Arithmetic Tests 
These are a series of fifteen tests devised to analyze the 
arithmetical processes, and in this respect differ from the 
Courtis tests described above. The following statement, 
taken from the report of the director of the testing work of 
the Survey, will explain the nature of the teats devised: — 

Spiral nature of the tests. The test, which was given to all of 
the A grades in the system, included a number of different forma 
of each of the (undanieiital operations. Thus, in addition, the 
firstrBJni~5hiipIest exercise of the test consisted in adding pairs of 
figures. Later in the series, addition appeared again, but in a 
more elaborate form. It was here required that a short column of 
figurea be added. The third case of addition consisted in the add- 
ing of fractions of like denominatars. The fourth case consisted 
in the addition oF a longer column of figures. This differs from the 
8hoi±.-cDhimn nddilion in the fact that a greater effort of attention ia 
required in order to complete the addition. Addition of four-place 
figures which requires carrying forward from one column to the 
next, and addition of fractions of unlike denominators, constituted 
the final and most elaborate stages of the addition process. The 
purpose of introducing these various types of addition was to test 



Set B.. 


Set C. . 


SetD.. 


SetE.. 



m EDUCATIONAL TESTS AND MEASUREMENTS 

the ability of the different {grades to perf arm increasingly elaborate 
op^Biipns. Similar spiral tests in subtraction, multiplication, and 
diviaiou were interwoven with the exercises in addition. 

In the second place, the test was so presented that the rate of 
work in the different grades could be determined. . . . The teat 
shows, therefore, both the complexity of the processes which a 
given grade can master, and also the number of examples of a given 
type that can be performed in a specified time,' 

These tests have also been used in the recent school sur- 
veys at Grand Rapids, Michigan, and St. Louis, Missouri. 
Nature of the tests. The nature of the tests devised may 
be seen from the accompanying samples taken from each 
of the fifteen, on pages 37 and 28. 
The time allowances for the several tests are as follows : — 
30 BBCoods Set F. . . 1 minute Set K. . . a 

30 seconds Set G. , . 1 minute Set L. . . 3 

SOaeconda SetH... 30 seconds SetM... 3 

30 aecimds Set I. . . 1 minute Set N. . . 3 minutes 

SOEeconds Set J... 3 minutes Set 0... 3 minutes 

As in the case of the Courtis tests the examples of each 
test are approximately equal in difficulty. Thus each test 
may be considered to consist of approximately equal units. 
In marking the test papers no credit is given for examples 
partly right nor for examples partly completed which 
insures uniformity and accuracy. A pupil's score is the 
number of examples attempted and the number right. 

In considering the completeness of this series of tests it 
must be remembered that decimal fractions are omitted, 
and that two tests are certainly inadequate for the field of 
common fractions. These tests, however, furnish a means 
for securing more detailed measurements of the arithmetical 
abilities of pupils than are possible by using the Courtis 
Standard Research Tests, Series B. 

' Judd. Chas, H. Measuring IkcWork rf the PuHic Schools, pp. ai-se. 



ARITHMETIC 27 

Set A. Addition 

1600417982186 
26512376 4580 



Set B. Subtraction 

7 11 8 12 1 , 18 4 12 
08 6 130788 6 



Set C. Mtdtiplicaiion 
2400542740 

Set Z). Division 
8)9 4)J2 e)Se 2)£ 7}^ 0}9 8)21 

fiie< E. Addition 

52926149 
28883467 
28054251 
05708535 

11 111111 

Set F. Subtraction 

616 1248 1365 1092 716 
456 700 618 472 344 

Set G. Mtdtiplicaiion 

2345 0735 8642 6780 2345 
2 5 9 _^ 6 

SetH. Fractions 

5 6' 9 9" 9"*"9" 9 9 



1 














~^^H 


«8 


EDUCATIONAL TESTS AND MEASUREMENTS 


■ 








Set I. Dimnott 








■ 




4}S54K4 




7)65983 2)58748 




5)41780 


^ 








Set J. Addition 












9 4 


7 


2 9 7 7 


8 


9 


4 3 


i 




2 5 


1 


9 6 » 1 S 





5 


3 1 


1 




4 8 


9 


4 2 6 3 5 


7 


3 


7 7 


6 




8 1 


4 


8 4 7 14 


1 


4 


7 6 


6 




a 4 


3 


5 7 4 1 


8 


6 


9 


1 




7 8 


2 


114 6 8 


5 


2 


2 8 


8 




5 5 


8 


5 3 S 5 2 


1 


S 


9 3 


6 




3 1 


5 


2 9 T 3 1 


3 


9 


5 4 


9 


S 


e 3 


2 


4 2 13 3 


7 


2 


6 5 


7 


3 


1 9 


7 


3 3 6 7 S 


4 


2 


3 4 


5 


2 


4 6 


7 


6 8 6 8 


9 


8 


4 2 


2 


» 


S 3 


1 


7 5 6 14 


4 


5 


8 9 


2 


9 


8 a 


9 


6 5 6 7 5 
SetK. Dwidim 


4 


6 


8 9 


4 




21)441 


32)67r 23)483" 


51)1173 




^^K 






SetL. Multiplication 








^^^H 


8246 




3597 5739 


2648 




9537 


1 


■ 


29 




73 _85 

SetM. Addition 


46 




92 






7493 


8937 8625 2123 


5142 


3691 






8016 


6345 4091 1679 


0376 


4326 






6487 


2783 3844 5555 


4955 


7479 






7591 


4883 8697 6331 


9314 


2087 






6166 


1341 73H 6808 


5507 


8165 










Set N. Division 












67)32763 




48)28464 97)36084 




59)29382 








SetO. FracHona 








,1 


^ 


15^6 

■ 


1 


^ ^-^ ^X 

14 4 4 ^ 


5 
6 




20 1 

21 "^6 


^ 





ARITHMETIC « 

5. The Woody Ariihmetic Setdea 
Measurement by means of a scale. Woody has recently 
devised a set of four scales, one for each of the four funda- 
mental operations. He states that his "fundamental idea 
was to derive a series of scales which would indicate the 
type of problems (examples) and the difficulty of the prob- 
lems (examples) that a class can solve correctly," ' The 
addition scale of Series A is reproduced on page 31. The 
character of the examples and their arrangement are the 
same in the other scales.* Twenty minutes are allowed for 
each scale of Series A, and ten minutes for each scale of 
Series B. This time allowance is sufficient for most pupils 
in grades above the fourth to complete all of the examples. 
The difficulty of each example has been determined and 
the examples of each scale are arranged in order of increas- 
ing difficulty. The author makes only this statement con- 
cerning the selection of the examples. " Each of the scales 
is composed of as great a variety of problems (examples) 
as the fundamental operations can well permit," and only 
those examples " were chosen which were solved by a grad- 
ually increasing percentage of the pupils as one proceeded 
from the lower to the higher grades." 

It should be noted that a "scale" differs fundamentally 
from the testa described above. The examples of a scale are 
not equally difficult. In the case of Woody's scales the score 
of a pupil is a statement of the particular examples which he 
has done correctly. The score of a class is the degree of 

' Woody, Clifford. Measuremevls of Some AdiieFevicnta In Arithmdio, 
p. 1. (Tcacliera College Contribution to Education, no. 80. 1916.) 

' There are two aeries of four scales each. Series B differs from Serial 
A in that it couaiats o[ anly certaia examples taken from Series A. 



30 EDUCATIONAL TESTS AND MEASUREMENTS 

difficulty of the example which was done correctly by just 
50 per cent of the class. Ability as measured by these scales 
means simply that certain types of examples can be done 
correctly and that certain other types cannot be done cor- 
rectly. The speed at which the examples can be done is not 
included in the meaning of ability. Thus "ability" when 
used in connection with the Woody Arithmetic Scales can- 
not have the same meaning as is attached to the word when 
used in connection with the Courtis Standard Research 
Tests, Series B, or other tests of the same type. 

On the basis of such analyses of arithmetical abilities as 
have been made, it is clear that all types of examples have 
not been included. It also appears from Woody's own state- 
ment that those which were chosen were selected not on the 
basis of their arithmetical significance but on the basis of 
the consistency of pupils' reactions, 

Woody's method of selecting examples may be called 
eiatistical as opposed to the analytical method employed by 
Coiirtis and other makers of tests. The statistical method 
neglects the subject-matter field in which tlie test is being 
constructed and assumes that an example is suitable for use 
in a test simply because it is done correctly by a gradually 
increasing per cent of pupils as one proceeds from grade to 
grade. It rejects as unfit those examples which do not have 
this characteristic' On the other hand, the analytical 
method involves a careful analysis of the field of subject- 
matter in which the test is being constructed to determine 
the fundamental types of examples which exist. This method 

' From what we know about the curve of learning It is doubtful whetha' 
this basis can be justified. Ability to do doea not iacreoBe grattually from 
grade to gnule. 



ADDITION SCALE 81 

Name 

When IS your next birthday? How old will you be? .... 

Are you a boy or giri? In what grade are you? 

(1) (2) (3) (4) (6) (6) 0) (8) (9) 

2 2 17 53 72 60 3+1- 2+5+1= 20 

3 4 2 45 26 37 IQ 
- 3 - - - - 2 

" 30 

(10) (11) (12) (18) (14) (15) (16) (17) (18) 25 

21 32 43 23 25+42» 100 9 199 2563 " 
38 59 1 25 33 24 194 1387 
35 17 2 16 45 12 295 4954 
"" ~ 13 ~ 201 15 156 2065 

"" 46 19 



(19) 


(20) 


J21) 


(22) 


(23) 


(24) 




(26] 


) 




$.75 


$12.50 


$8.00 


547 %+% = 


4,0125 %+%+% 


i+% 


SB 


1.25 


16.75 


5.75 


197 




1.5907 








.49 


15.75 


2.33 


685 




4.10 














4.16 


678 




8.673 












.94 


456 






^" 












6.82 


393 
525 
240 
152 














(26) 


(27) 




(28) 


(29) 


(30) 


(31) 




(32) 




12% %+% + % = 


%+y4 


4% 


2V2 1 


113.46 


%+%+%= 


62V2 








2^ 


6% 


49.6097 








12% 








5^ 


3% 


19.9 








87% 










# 


9.87 

.0086 
18.253 
6.04 








(33) 


















• 


.49 




(W)_. 




(36) 




(36) 






(37) 


.28 


ye+%- 




2 ft. 6 in. 




2yr. 5i 


mo. 




16% 


.63 








8 ft. 5 in. 




3yr. 6 


mo. 




12% 


.95 








4 ft. 9 in. 




4 yr. 9 


mo. 




21% 


1.69 












5yr. 2 


mo. 




82% 


.22 


« 










6yr.7 


mo. 




~^^^ 


1.33 




















.86 




















1.01 




















.56 




















.88 








(38) 












.75 


25.091+100.4+25+98.28+19.3614- 








.56 




















1.10 




















.18 




















.56 





















i 



3* EDUCATIONAL TESTS AND MEASUREMENTS 

is well illustrated in the derivation of the Addition of Frac- 
tions Teats by Ballon. These tests are described on page 34. 
Woody makes the following statement with reference to 
the uses of the scales: — 

Perhaps the most valuable use of the scales lies in the diagnosing 
power of the class mistakes. The writer was convinced during the 
process of scoring these test papiers, nearly 20,000 in all, that the 
mistakes of a class tend to be grouped around some ceotral ten- 
dency. The great variety of the problems id these scales, and the 
fact that the problems in each of the various operations proceed 
frwn the simplest to the more difficult problems, aid greatly in the 
location of the weaknesses of the class. If a large number in a class 
fail to invert the divisor in the problems in division of fractions, 
or if a large number in a class fail to locale the decimal point 
properly in the problems in multiplication of decimal fractions, a 
teacher should know immediately that these classes need more 
practice in tiese particular processes. In a like manner, by locating 
the particular types o! problems missed, one should be able to 
direct the work of a class more intdligently. 

To obtain a diagnosis of a class it is necessary to tabulate 
the results for each example. The score sheet for a typical 
class is given in Table 11. The examples not listed in the 
tabulation were not done incorrectly by more than two 
pupils. An example not attempted b indicated by a dash. 
An example done incorrectly is indicated by I. By examin- 
ing the per cent of examples right at the bottom of the 
table one learns the types of examples on which this class 
needs instruction. 

In view of the maimer in which the examples for the 
scales were chosen, it seems reasonable that certain limita- 
tions should be placed upon Woody's claim for the diag- 
nosing power of these scales. It may also be questioned 
whether one example b sufficient to test adequately the 



ABTTHMETIC S8 

Table II 

SHOiriNa IBS TABDIi&TION OT THE BOORB8 MADE BT THE 
INDIVIDUAL PUPIL3 OF A SIXTH-GRADE CLASS DPON TH£ 
WOODT ADDITION SCALE, SERIES A 



r 



34 EDUCATIONAL TESTS AND MEASUREMENTS -. 

ability of a class to do examples of that type. ' For instance, 
the addition combinations are represented by only these 
two, 2+3aiid3 + l. Finally, it should be remembered that 
of the characteristics of specific abilities, accm-acy only, is 
measured. The time allowed is sufficient for most pupils 
to complete the test, 

4. Research Tests in Arithmetic; AddHimi of Fractions 
These tests, devised by F. W. Ballou, Director of the Bu- 
reau of Educational InvestigationsandMeasurement, of the 
Boston Public Schools, furnish a good illustration of tests 
based upon a careful analysis of abilities. The analysis of 
the addition of two fractions revealed 14 types of examples,* 
which arise out of reducing the fractions to a common de- 
nominator and reducing the answer to the lowest form. 
This analysis was corroborated by a preliminary testing of 
pupils. It was found that pupils could do certain types 
of examples and fail on others, showing thereby that ability 
to do examples of one type did not function efficiently in 
doing examples of other types. It is obvious that the 
addition of three or more fractions involves a number of 
other types of examples. It is also obvious that subtraction, 
multiplication, and division each involve a number of 
types of examples. 

In order that both speed and accuracy may be measured, 
a separate test is needed for each type of example. Cer- 
tain types of examples are included in others. The example 

' Recently the writer has collected data which indicates that the diagno- 
sis secured hy the use of Woody's Scales is defective in this respect. 

' Arilkmetic. Determining thf AekieremeJiU of Pupde in the Addition of 
Fraetiom. (School Document no. 3. 1916. Boston Public Schodla.) 



ARITHMETIC 



35 



1 



■ji + Y5 includes examples of the types J- + f and § + ^. 
Becognizing this fact, Ballou has constructed a series of six 
tests to measure in detail the ability of pupils to add two 
fractions. Each test is illustrated by two examples. The time 
allowance for each test is two minutes. 



Addition of FRAcriONa 



Testl 

(8) ; 



Tests 
(2) 5 



Test 5 
CD 2 (2) 1 



5. The Stone Reasoning Test M 

Several tests have been devised to measure the abilities 
of pupils to solve problems involvmg reasonmg but none of 
them have proven very satisfactory. Some years ago Stone^ 

' Stone, C. W. Ariikmeiieal Abilitiea atid Some Fadora Determimng 
Tkem. (Teachers College Conlributiona to Education, no. IB. IS08.) See 
also Stone. C, W., SlandaTdizat Reajaning Tests in Arilhmtlv: and How to 
UtiUat Tkem. (Teschera College Contributions to Education, no. S3. 1S16.) 



S6 EDUCATIONAL TESTS AND MEASUREMENTS 

worked out a reasoning test wliich has been used in several 
cities, and in a number of city school surveys, so that we 
have rather definite standards as to what may be expected 
from its use. 

The Stone RuAaONiffo Test 
. .Grade Name of PupU. . 



Solve as manf of the following problems as you have time 

for; work them in order as Dumbered : 
1. If you buy 2 tablets at T cents each and a book for 65 cents, 

how much change should you receive from a two-dollar bill? 
e. John sold 4 Saturday Eveclng Posts at 5 cents each. He kept 

i the money and with tbe other ^ be bought Sunday papers at 8 

cents each. How many did he buy? 

3. If James had 1 times as much money as George, he would have 
816. How much money has GeorgeF 

4. How many pencils can you buy for 50 cents at the rate of £ 
for 5 cents? 

5. The uniforms for a baseball nine cost $2.50 each. The shoes 
cost 9i a pair. What waa the total cost of uniforms and shoes 
tor the nine? 

6. In the schools of a certain city there are 8200 nupds: i are 
in the primary grades, J in the gramioar grades, j in the H'gh 
School and the rest In the night school. How many pupils are 
there in the night school? 

7. If Si tons of coal cost S2I, what will 5^ tons cost? 

8. A news dealer bought some magazines for 81. He sold them 
for SI. 20. gaining 6 cents on each magazine. How many maga- 
zines were there? 

9. A girl spent i of her money for car fare, and three times as 
much for clothes. Half of what she bad left was 80 cents. How 
much money did she have at first? 

10, Two girls receive SS.IO for making buttonholes. One makes 
*2, the other 28. How shall they divide the money? 
. Mr. Brown paid one third of the cost of a building; Mr. John 
son received $5(H) more annual rent than Mr. Bron-n. How 
much did each rcreive? 
I. A freight train left Albany for New York atC o'clock. An cs- 
press left on the same tradt at 8 o'clock. It went at the rale of 
40 miles an hour. At what time of day will It overtake the 
freight train if the freight train stops after it has gone 3G miles? 



ARITHMETIC 37 

The time allowance for the t«st is fifteen miniitea. Stone's 
plan for marking the test papers allows credit for examples 
partly right and for examples which were not finished. The 
problem values have been determined upon the basis of 
difficulty. It should be noted that this plan for marking the 
test papers is not as simple as that employed for marking 
the test papers on the operations of arithmetic. 

6. Other Reascming Teds 

Courtis included two reasoning teats ' in his Series A. 
Starch has devised a test which is called Arithmetical 
Scale A.' This scale included a number of the problems 
used by Stone, Courtis, and Thorndike. They have been 
evaluated upon the basis of difficulty and arranged in order 
of increasing difficulty. The pupils are allowed as much 
time as they need and a pupil's score is the value of the most 
difficult problem done correctly. 

The problem field of arithmetic has not yet been analyzed 
and we do not know what the fundamental types of problems 
are. A partial analysis ' indicates that there are many types 
of problems of which probably a relatively few are funda- 
mental. Until the fundamental types of problems are deter- 
mined by analysis it will not be possible to devise a test 
which will be as satisfactory as the tests which we now have 

' Courtis, S. A. if anual of Inslruelions for Gicing arid Scorins (he Courtis 
Standard TeiU in the Three R's. (Detroit, 19 U.) These tests are no longer 
published. 

' Starch, Daniel. "A Scale for Meaauring .Ability m .^rithmetie," in 
Jounial of Edueationtil PsynfuAogy, vol. 7, pp. 213-8!. 

' Monroe Walter S, " A Prdiminary Report of an Inveatigation of the 
EcoDom]' of Time in Arithmetic"; Siiteenlh Yearbook of the S'aliontd 
SoaiayfoT tke Stvdj/ of Education, part i. 



38 EDUCATIONAL TESTS AND MEASUREMENTS 

fop the operations of arithmetic. Although it is probably 
wise to use the reasoning tests which are now available their 
Umitations should be kept in mind. 

III. Standard Scores 
The degree of abihty which a pupil of a given grade 
should possess is called a standard. Standards are neces- 
sary to give meaning to th« scores which pupils make. 
In most cases the standards are median ' or average 
scores and thus represent merely the consensus of present 
practice. Such standards are open to the criticism that we 
cannot be certain that our present practice is satisfactory, 
but it seems probable that standards derived in this way 
will not be changed materially in the near future provided 
they are based upon a sufficient number of cases. The topic 
of standards and the use of them is discussed at length in 
Chapter IX and the reader may profitably study it in con- 
nection with the topic of standard scores in this and the 
following chapters. 

i. Courtis Standard Research Tests, Series B. 
Standard Median Scores. In Table III there are given 
three standard scores: (1) general median scores based 
upon distributions of "many thousands of individual 
scores in tests given in May or June, 1915-lC. The dis- 
tribution for each grade was made up of approximately 
equal numbers of classes from large-city schools and 
from small-city and country schools"; (2) the standards 
proposed by Courtis after three years' use of these tests; 
' See page 842 for de&nitioQ of " median." 



ARITHMETIC 

(3) Boston standard median scores after the tests had been 
used for three years. 

With reference to the standards which he has proposed 
Courtis says : — 

The speeds set as standard are approximately the average speeds 
at which the children of the differeat grades have been found to 
work when tested at the end of the year, when for any one grade 
a random selection of five thousand scores from children in schools 
of all types and kinds are used as a basis of judgment. 

Standard accuracy is perfect work, one hundred per cent. This 
is a tentative standard only, as there is available very little in- 
formation in regard to the factors that determine accuracy and 
the effects of more efficient training. 

At present in addition and multiplication it is only very excep- 
tional work in which the median rises above eighty per cent ac- 
curacy, while in subtraction and division the limiting level is 
ninety per cent. 

Standard speeds are not likely to change greatly. Standard 
accuracy is surely destined to approach much more nearly one 
hundred per cent than present work would indicate. 

Standard scores are not only goals to be reached; they are 
limits not to be exceeded. It seems as foolish to overtrain a child 
as it is to undertrain him. All direct drill work should, in the 
judgment of the writer, be discontinued once the individual has 
reached standard levels. If his abilities develop further through 
incidental training, well and good, but the superintendent who, by 
repeated raising of standards, forces teachers and pupils to spend 
each year a larger percentage of time and effort upon tlie mere 
mechanical skills, makes as serious a mistake as the superintendent 
who is too lax in his standards.' 

Comparisons with these standards or any others are valid 
only when the tests have been given under standard condi- 
tions. Slight changes in the method of giving the tests may 
affect the scores as much as the difference in the standards 
from one grade to another. 

' Courlia, S. A. Third. Fotirih, and Fifth Annual Accounting. 1913-16 
(Dcputment of Cooperative Research, Detroit), p. 19. 



40 EDUCATIONAL TESTS AND MEASUREMENTS 



Table m. Stamdahd Median Scores, Standard 
Research Tests, Series B 







A*i«™ 


^K».™.«™ 




Ciumm 


Oradt 


I 


r 
1 


1 


1 


I 


1 


1 


, 


IV... 

V... 

VI,.. 

VII... 

VIII... 


General 

Courtis 
Boston 

General 

Courtis 
Boston 

General 
Courtis 
BostoQ 

General 

Courtis 
Boston 

General 
Courtis 
Boston 


7-1 



8 

».e 

8 
9 

0.8 
10 
10 

10,8 

11, a 

12 


64 

100 
70 

100 
70 

73 
100 
70 

75 
100 
80 

76 
100 
SO 


7.4 

7 

7 

9.0 

9 

9 

10,3 

11 

10 

11 6 

le 
11 

12.9 
13 

12 


80 
100 

80 

100 
SO 

85 
100 

90 

86 
100 
90 

87 
100 
90 


o.e 

6 
6 

7.5 
S 

7 

9,1 

9 

9 

10,2 

10 

10 

n.5 

11 

11 


67 

100 
60 

75 
100 

70 

78 
100 
BO 

80 
100 
80 

81 
100 
SO 


4.6 
,4 

4 

6,1 

6 

6 

8.8 

8 

8 

9.6 
10 
10 

10.7 

11 

11 


57 
100 
60 

100 
70 

87 
100 
80 

90 

100 
90 

91 

100 
90 






r d[ eiampk 



in tbc 111 



I '^ nmlHlls ven de1enilill«d b> Cbuiiia Dd 
of latiulatioDi tJ other " 

I. (DcpsitiDeiltDlCnOricnit 



Slid Mcuure 






.. rib'rd, FoUTlh, and riflk AnmuU 

Tbe Biuton lUndirdi wm atabliihed Etler luing tbc tnta lor Uiirc gait. Bulkni, 
F. W. AriOimttir, Ok Caartii Slattdard Teali in &uln'^ 1911-lS. (Bulletin No. ID il Uw 
DepHrtmenl ' " ' '' ' " 



2. The Cleveland-Survey Teals 
Cleveland and Grand Rapids scores. The tests which we 
have designated as the Cleveland-Survey tests have also been 
used in the school survey made at Grand Rapids, Michi- 
gan. To show what results may be expected from the use of 
these tests we give, in Table IV, the median scores obtained 




^ J 



ABITHMEnC 



41 



in Cleveland and Grand Rapids. In all eases the medians 
are for the lower half of the grades. Upper numbers, medians 
for Cleveland, Ohio. Lower numbers, medians for Grand 
Rapids, Michigan. 

Table IV. Showing the Median Scores for the 
ClbvelanihSurvey Arithmetic Tests 



TvA 


Orade 


3b 


4b 


6b 


6b 


7b 


8b 


A 


13.4 
11.8 


17.8 
13.6 


22.2 
20.3 


24.8 
22.8 


26.7 
26.5 


27.5 




29.5 


B 


9.3 
6.3 


13.4 
9.1 


17.2 
14.7 


19.8 
16.8 


21.5 
21.3 


. 26.0 




22.8 


c 


6.5 


12.0 
7.1 


15.5 
13.7 


16.6 
15.5 


17.7 
17.7 


19 




19.3 


D 


6.3 


12.4 
6.9 


15.7 
12.5 


18.5 
15 5 


20.8 
18.4 


22.5 




20.5 


E 


4.8 


5.3 
4.1 


6.3 
5.2 


6.8 
6.0 


7.5 

7.2 


7 8 




7.8 


p 


2.0 


4.0 
2.8 


6.7 
6.0 


7.5 
7.1 


8.6 
9.3 


10 1 




10.3 


G 


2.0 


3.9 

2.2 


5.2 
4.5 


6.5 
5.3 


5.9 
6.1 


6 6 




6.7 


H 


0.0 


0.0 


5.0 


6.6 

6.2 


7.7 
9.0 


8 5 




8.6 


I 


0.6 


1.1 


2.0 


3.1 


4.0 


4.7 






0.7 


1.3 


2.3 


3.8 


4.0 


J 


1.0 


8.2 


4.0 
3.4 


4.1 
4.1 


4.9 
5.4 


5 7 




5.7 


K 


0.0 


4.0 


6.8 
3.0 


8.5 
5.4 


10.1 
7.5 


12 5 




9.7 


L 


0.0 


1.7 


2.5 
2.3 


2.8 
3.3 


3.2 
4.3 


3 9 




4.9 


M 


1.4 


2.5 


3.2 
3.0 


3.8 
4.3 


4.4 

4.0 


5 1 




5.7 


N 


0.0 


0.8 


1.3 
0.7 


1.7 
1.1 


2.0 
1.7 


2 6 




2.0 





0.0 


0.0 


0.0 


3.1 
3.5 


4.1 
3.9 


6.5 




5.5 



42 EDUCATIONAL TESTS AND MEASUREMENTS ^^M 

3. The Woody ArUkmetic Scales 
' No Htandarda are given for the Woody scales for the rea- 
son that due to their recent development, only tentative 
standards are as yet available. 

4. The Addition of Fractions Tests 
The following tentative standards have been worked out 
in Boston from the use of these tests: — 

Table V. Boston Medians: AnnrnoN or Fractions 


VII. 


s 


Tutl 


™> 


Ted 3 


7^. 


rtrts 


Teiia 


\ 


I 


1 

1 


S 


1 
1 


r 
1 


I 


1 
1 


1 

1 


1 
1 


I 


1 " 

1 


1130 


W.7 


BB.O 

as. 3 


n.G 


z 


5.5 


4a. 1 


e.o 


E 


4.0 


Bl.O 
63.4 


4.4 


48.S 
16.5 


5. The Stone Reasoning Test 
While this test has been used in many cities, in few- 
have all of the upper grades been tested and the records 
been kept separately by grades. Stone tested in twenty- 
six different cities, but used only the 6A grade. The test 
has been applied by others in comparison, but also using 
only the 6th grade. The sixth-grade scores for the twenty- 
six cities tested by Stone give the following results: — 

Lowest 3.5fl 

Middle 5.50 

Highest 9.14 




Scorai ij"" 1 2 J 1 .-. -; 7 S 
Fia. 2. DwmiBUTioN of Scores with the Stone Reaboninq Tebis 

BqjrOKntiliB the ptmnUgeiif cbMicn nuUng the giveii •nun in rnxniing probtfnu. 
For ouiple. 18^ «nt <if tbe Sftb-gnde cbildien mule ■ Koie of 0; 1« per tan uiadF > 

n the gndit BboTC, uul hair mmay fall bcLow the 



44 EDUCATIONAL TESTS AND MEASUREMENTS 

In three cities where school surveys have been made re- 
cently the scores were taken separately by grades. The 
results in these three cities are given in Table VI. The 
wide distribution of scores for Butte is shown in Fig. 2, 
given on the opposite page. 

Table VI. 



Grad. 




'■ss"- 


Sail Lake Cily, 
Utabt 


V ...... . 


3.9 
3,8 
7.7 


58 
4'5 





















of SuitenqTBullc. tfanfniu, p. 88, i\»U.> 
Ike Srhcoi St/'tem of Bridgeparl. Conn., p. lOi 
ml 5«iMn r^ Sail Lain Ci'k, Vtak, p. IBS, (] 
B iiirvey work in these atxt. wa<M indint 
y td RUOD, Ihe crhildicD ot SiK Lake Gty h: 



Jlecently Stone has issued the following standards : — 



That 8 



?hat 80 per cent or more of 5th grade pupils reach or exceed a 
re of 5.5 with at least 75 per cent accuracy; that 80 per cent or 
more of fitli grade pupils reach or exceed a score of 6.5 with at 
least 80 per cent accuracy; that 80 per cent or more of 7th grade 
pupils reach or exceed a score oi 7.5 with at least 85 per cent accu- 
racy; that 80 per cent or more of 8th grade pupils reach or eiceed 
a score of 8.75 with at least 90 per cent accuracy.' 

The accuracy of individual scores. In considering the 
accuracy or preciseness of the measures obtained by the use 
of these tests, it should first be noted that in most of these 

' Stone, C. W. Standardized Reasoning Tesla in Arilkmelic and How lo 
Ulilia Theia. (Teachers College Conlributions to Education, no. 83, 1916.) 



ARITHMETIC 



W tests and the manner of giving them the sources of error 
I mentionedoa pages 8-13 are eliminated. The plan of mark- 
'' ing all examples as either right or wrong insures uniformity 
and accuracy in marking the papers. The rate at which the 
pupil works is measured as well as the quality of his work. 
The exercises are either equal in value or their difficulty 
has been expressed in terms of a common unit. By providing 
each pupil with a printed list of the examples, copying the 
example either from dictation or from the board is elimi- 
nated. It has been shown that pupils do not copy accurately 
nor do they copy with equal speed. The elimination of copy- 
ing the examples eliminates a probable source of error. 

There are, however, other factors to be considered. /A 
pupil's performance or actual achievement from which nis 
ability in a given test is inferred depends upon his physical, 
mental, and emotional condition. These change from day to 
day, and from hour to hour and cause marked variation in 
successive scores of some pupils. jThis variation is much 
greater in some pupils than in others. At any particular 
time a small per cent of the pupils will be found upon a 
plane higher than their normal ability, while other pupils 
will be found at the low ebb of their ability. . This fact causes 
some of the measures to be unreliable in the sense that they 
are not true indices of the average or normal abilities of 
certain pupils. However, this happens in only a relatively 
small per cent of the cases when care is exercised in giving 
the tests to secure standard conditions, and the number of 
these cases can be materially reduced by giving the testa 
a second time and taking the average of the two sets of 



46 EDUCATIONAL TESTS AND MEASUREMENTS 

Courtis ^ states that " about one child in ten will have a 
markedly unreliable score," and " for two thirds of the chil- 
dren the differences will be relatively small." 

Gain or loss in repeating tests. To test the accuracy of 
the individual scores, and the estimate of Courtis, the writer 
had the four Courtis tests of Series B given to the pupils 
in one school a second time. In order that the pupils might 
not do the same examples, they were asked, on the occasion 
of the second trial, to begin with the last example and work 
forward. The results are given in Table VII. For addition, 
the table is read as follows: One pupil did seven fewer ex- 
amples the second trial, three did four fewer, four did three 
fewer, five did two fewer, etc. The addition scores aver- 
age .15 of an example less on the second trial than they 
did on the first. Eighty-two per cent of the scores dif- 
fered by one example, or agreed. Although this table in- 
cludes too few cases to warrant any final conclusions, it 
is interesting to note that the facts are in accord with 
the statement of Courtis. 

IV. How TO HANDLE WHAT THE TbSTS REVEAL 

Scientific management. During the past few years scien- 
tific management has been applied to many forms of human 
endeavor with results which have been nothing short of 
marvelous. For example, bricklaying has been practiced 
by intelligent artisans for centuries, and one might suppose 
that in the course of that length of time a highly efiicient 
system of laying brick would have been evolved on the basis 



H 1 

8 J 









Hi 


S g s s 


' fi 


fi 


-.15 

-.10 

,18 

-.28 


' ' 


1 


1 S S S 


i < 


« 


1 1 1 1 


1 1 


' 


H ^ 1 - 


« » 


•■ 




03 * 


« 


" 2 S * 


s ° 


- 


3 2 S S 


8 S 


•= 


S; S 3 S 


H 1 


7 


s SI a a 




1 


•o so m t^ 


s : 


1 


■• 2 " — 


s " 


T 


« - , - 


" " 


? 




. 1 


T 


1 1 [ > 


1 1 






1 1 


T 


-11, 


- ". 


? 


1 - 1 1 


„ « 




III.. 


— "^ 


1 


Addition 

Subtrsction.. 
Multiplication 
Division 


1 1 



48 EDUCATIONAL TESTS AND MEASUREMENTS 

of q^identa) successes and imitation, if in no other way. 
It appears, however, that the method followed has remained 
practically unchanged for centuries until recently, when the 
principles of scientific management were apphed to the 
process. A scientific analysts of the process revealed that 
eighteen motions were made in laying each brick, while only 
five motions were needed when the material was properly 
arranged. 

Another striking illustration is given by Frederick W. 
Taylor in his book. The Prindples of Scieniijic Manage- 
ment. Some years ago a large quantity of pig iron was being 
loaded on flat cars at the Bethlehem steel plant. Pig iron is 
cast in blocks, each of which weighs ninety-two pounds. 
The method of loading was for the workman to pick up a 
pig, walk up an incline, deposit it upon the car, walk down, 
and repeat. The average amount of pig iron loaded per roan 
per day was twelve and one half tons. It was very crude 
labor, and obviously the amount of pig iron which a work- 
man might load in a day depended upon his physical strength 
and how he used his strength. If he worked rapidly, with little 
rest, he soon became exhausted. If he rested too frequently, 
he wasted his time. A worlunan's efficiency depended upon 
his rate of working and the length and distribution of his 
rest periods. The principles of scientific management were 
apphed to the process, with the result that one workman 
loaded forty-seven and one half tons a day and the length 
of the working day was shortened. 

These two illiistrations are typical of a large number of 
instances in industrial activities where the efficiency of work- 
men has been greatly increased by the application of the 



ARITHMETIC 49 

principles of scientific management. They suggest ^that 
the instruction in our schools may be made more efficient 
by the application of the same principles. For the field of 
education the principles of scientific management involve 

(1) an analysis or diagnosis of the teaching situation, and 

(2) the selection of methods and devices of instruction to meet 
the situation revealed. 

Diagnosis of the teaching situation. In the consideration 
of the problem of measurement, it was shown that there 
are as many specific abilities involved in performing the 
operations of arithmetic as there are types of examples. 
These operations must be performed with a minimum of 
attention, so that the focus of the attention may be devoted 
to the consideration of problems. Thus, the teacher has the 
problem of engendering in each of her pupilsa large number 
of automatic abilities or specific habits. 

The individual dilferences of pupils furnish another factor 
of the teaching situation. Pupils differ in native ability and 
in their past experience. Some pupils are eye-minded, some 
are ear-minded, and still others are motor-minded. These 
differences become prominent in their learning. Some pupils 
grasp quickly the response which is to be made by seeing 
another perform it, others require a detailed explanation, 
and still others progress most rapidly by being allowed to 
reason out the appropriate response. Pupils also differ in 
the amount of practice they require to reach a given de- 
gree of facihty m performing an operation. 

These two conditions make the teaching situation which 
the teacher faces a complex one. Before she can intelligently 
direct her efforts as an instructor she must diagnose the 



BO EDUCATIONAL TESTS AND MEASUREMENTS 

situation.' The tests described in this chapter furnish 
a means for doing this. The scores reveal the shortcom- 
ings of classes and of individual pupils. 

Pupils* and class records charted. Courtis has devised 
a number of simple charts which can be used to exhibit 
graphically a pupil's record, compared with the standard 
scores. One form of these is shown in Fig. 3 which gives 
the record made by a sixth-grade pupil with the Courtis 
Standard Hesearch Tests, Series B. The heavy line drawn 
across the chart shows the individual standards set by 
Courtis; the dotted line, the scores made by the pupil. A 
comparison of the actual scores made by the pupil with the 
heavy line reveals at once the examples on which this pupil 
needs instruction. In some of the fundamental operations 
he has been drilled until he is ahead of the standard for his 
grade. Fig. IS in Chapter VIJI shows another form of score 
card, and exhibits a pupil's entire record not only in arith- 
metic, but in reading and writing as well. The scores ob- 
tained by giving the Cleveland-Survey Tests are interpreted 
in the same wa,y, and yield a more detailed diagnosis of the 
arithmetical abihties of a pupil. 

In a similar way the strength or weakness of a class or a 
school may be shown by comparing the class or school 
medians with the standards set for the class or with the 
median results in the same test in other classes or schools. 
This is well shomi in Fig. +, which compares the median 
scores obtained for the whole city by the Cleveland siu'vey 

' Additional auggeations for diagnosing the shortcomings of pupila are 
given in the Teacher't Manual for Ihe Courtis Standard Practice Teats 
(World Book Company, Yonkera, New York), and by Stone, C. W., 
Standardised Beatoning Testa in Arithmetic and Haw to VtUize Them 
fTe&chers College Conlributioiu to Education, no. 63. 191G\ 



^^H ARITHMETIC 


..M^ 






Di^, 


AllempU 


KigM, 


Allempli 


RigUi 


AUempH 


Righlt 


AlUmpi, 


-fivUi 


84 


u 


24 


24 


24 


24 


24 


24 


23 


23 


83 


23 


23 


23 


23 


23 1 


ea 


22 


22 


22 


22 


22 


22 


22 ^ 


21 


il 


21 


21 


21 


21 


21 


81 H 


so 


80 


20 


20 


2Q 


20 


80 


80 H 


19 


19 


19 


19 


19 


1» 


19 


1* 1 


18 


18 


18 


18 


IS 


IS 


18 


IS 


17 


17 


17 




17 


17 


17 


17 


IS 


IS 


10 




10 


IS 


IS 


10 


W 


15 


IS 




IS 


IS 


IS 


15 


1« 


14 


14 




14 


14 


14 


14 


IS 


13 


'13-., 




13 


13 


13 


IS 


12 


12 


/l2 




18 


12 


^18"- 


-18 


11 


11 


fir 

' 9 
S 


-;js 


*-l\ 


" 


/ 11 


11 

'I 

— 8 


Staj^ 10 

Score 9 
8 


Q 
8 , 


8 






K- :-- 


7,' 


7 




7 


7 


7 


7 


e! 


6 




6 


S 


6 


S 


5 


-V 


* 




S 


s 


5 


s 


4 


4 


4 




4 


4 


4 


i 


3 


3 


3 




3 


3 


3 


3 


1 


2 
1 


2 
1 




2 

1 


2 

I 


2 
I 


1 


Pig. S. a Cbart, Showing the Stores made by a Sixth-Grade Pupil, 

IN COMPARISON WITH THE STANDSRO ScORES, CSINO THE CoOHTlH 

AiuTBiiiiTtc Tests 


OD the D teat in simple division (see page 27) with the median 


scores in five selected schools. The need for closer super- | 


vision is at once evident. "^^1 



58 EDUCATIONAL TESTS AND MEASUREMEN': 




i 



I wooidrii^r) 



[cE] 



I Tod I 



D. DiviiioD. See pi 



ARITHMETIC S3 

Tbe Woody Arithmetic Scales also furnish a diagnosis of 
the class when the scores are tabulated as shown in Table II, 

A more detailed study of the errors of those pupils who 
are below standard is needed to ascertain the causes of their 
weaknesses. A method ' which has been used with success 
is to have the pupil being studied do the examples orally 
and note the errors which he makes. This gives the teacher 
a clearer understanding of the pupil's mental processes. 
By means of this method it ha.s been found that "a large 
nmnber of errors is due to the incomplete automatization 
of the simple facts of addition, subtraction, multiplication, 
and division." 

Meeting the situation: Laws. In meeting the situation 
revealed in the case of the operations of arithmetic, the first 
prerequisite is that the general method of instruction be one 
suited to engendering specific habits or automatic responses. 
The laws governing the engendering of specific habits have 
been quite definitely established. 

Stated in psychological terms, the first law is that in the 
begitming the attention of the learner shall be focalized 
upon the habit to be acquired. In terms of schoolroom prac- 
tice this means that the learner shall understand what re- 
action is to be made to a given stimulus, and shall then 
react to it in tbe appropriate manner. This gives the learner 
the right start. 

The second law is that the accomplishment of the step 
outlined in the first law shall be followed by attentive re- 
petitions. It is not sufficient that there be simply repeti- 



54 EDUCATIONAL TESTS AND MEASUREMENTS 

tions or drill. Tlie drill must be attentive. In the case of 
the operations of arithmetic this drill may be detached from 
the solving of problems, or it may be givea in the solving 
of problems. 

The third law states that no exception shall be permitted 
until the habit is firmly established, which means that the 
attentive practice must be continued until the operation 
has become a habit, that is, has been made automatic. 

The instruction based upon these laws must be adapted 
not only to the needs of the class, but also to the needs of 
the individual pupils which the tests reveal. The class needs 
can be met by placing emphasis upon the types of examples 
which the pupils as a group are imable to do with standard 
ability. This emphasis may mean simply more drill, or it 
may be that the difficulty is due to the pupils' not understand- 
ing how the operation is to be performed. If the latter is the, 
ease, explanation, or illustration, or opportunity to think it 
through is needed. 

In order to be most effective, the repetitions must be 
attentive. This means that the drill must be eflectively 
motivated. Arithmetic is one of the best liked of the school 
subjects. This is particularly true of the operations. This 
being the case, the motivation of drill in arithmetic is a 
comparatively simple matter, and in most cases it will be 
sufficient simply to start the pupils to work and to keep the 
work from lagging. When more than this is necessary the 
teacher must demonstrate her resourcefulness by providing 
an effective method or device for the motivation of arithmet- 
ical drill. In the lower grades the playing of certain games 
provides practice upon certain tj-pes of examples. In the 



1 



ARITHMETIC BS 

upper grades ciphering matches, or, better, the setting of 
definite standards in both speed and accuracy, are very 
effective motives. 

Individual vs. class needs. However, classes are com- 
posed of individual pupils who differ in their needs. Only a 
few of their needs will be common to the class as a whole. 
The usual class instruction in arithmetic does not meet 
these needs. Frequently the writer baa visited classes in 
arithmetic which were being drilled upon the fundamental 
operations. A fairly uniform procedure was followed. The 
same example was dictated to all of the pupils, regardless 
of whether they needed drill upon this particular type of 
cjcample or not. Naturally some pupils finished very quickly, 
and, as they waited for their classmates to finish, there was 
a tendency for them to become disorderly — ■ a perfectly 
natural tendency. When a majority of the class had finished 
the example the teacher stopped the work and read the cor- 
rect answer. The process was then repeated. The result was 
that those pupils who worked slowly completed few, if any, 
examples during the entire period, and, therefore, received 
little satisfactory drill. The bright pupils spent a con- 
siderable proportion of their time waiting on the other 
members of the class, and probably did not need the par- 
ticular kind of drill which they received. 

The scores of a group of pupils do not cluster closely 
about the median. When the distributions of the scores for 
successive grades are compared a great overlapping is found. 
Some pupils in the fourth grade make higher scores than 
a number of the eighth-grade pupils. Table VIII shows 
the distribution of pupils in a certain city according to the 



56 EDUCATIONAL TESTS A^fD AfEASUREMENTS 

number of examples attempted in the subtraction test of the 
Courtis Standard Research Tests, Series B. An examination 
of this table reveals these facts. 

In the fourth grade 23 per cent of the pupils reach or 
exceed the fifth-grade median. 

In the fifth grade 23 per cent of the pupils reach or exceed 
the sixth-grade median. 

In the sixth grade 24 per cent of the pupils reach or exceed 
the seventh-grade median. 

In the seventh grade 40 per cent of the pupils reach or 
exceed the eighth -grade median. 

This condition is merely typical. It is due in part to 
native individual differences and in part to training. In one 
experiment' three types of drill were used:— 

(1) Class drill supplemented by individual assistance on the 
points of weakness as diagnosed by the results of the teat; (2) class 
drill with extra drill periods provided for the slow pupils, who were 
drilled in groups rather than individually; and (3) merely class 
drill with explanations to the class as a whole. 

After these types of drill tad been used for a month and 
the results carefully measured, the following conclusions 
were reached: — 

(1) All three types of drill produced very large increases in the 
achievements of the pupils. (2) Class drill supplemented by indi- 
vidual help at the points oF weakness as diagnosed by the first test 
proved much more efficient on account of the exceptional decrease 
in the variation among the members of the class. This decrease 
in variation was shown by the decrease in the quartile coefficient 
of deviation. (3) It has been shown in both the first and second 
types of drill that individual variations which some writers ascribe 
to hereditary influences may be greatly modified by appropriate 
instruction. 

' Smith, James H. "iDdividusl V'iiriatio[iainArithmetic";iD£';n[eniarii 
Sohool JouTnal, vol. IT, p. IBS. 




I 

J 



.5 la 2 



n 



1 




" " " 2 S 


8 


o 


- 


s 


- 


s 


- 


ft 


- 


s 




* 




a 


« - - 


a 


- ' ' 


a « - « 2 


5 - » «. 3 2 


S 


' - S S 2 


• 


- ■» 2 2 - 


« 


"5225 


- 


s a s ' ' 


» 


S 2 " " 


" 


2 2 «■ - 


, ... -1 




5 " -^ - 


" 


±- - 


- 


^ 






s > sss 



1 

58 EDUCATIONAL TESTS AND MEASUREMENTS 

Obviously, it is the pupil who performs the operation 
slowly and with difficulty who needs practice. The pupil 
who already is skillful in performing the operation does 
not need further drill. Our present procedure provides 
drill for the pupil who does not need it, and prevents the 
pupil who does need it from receiving it in a satisfactory 

' ' Repeating the tests after an interval. The effect of class 
instruction b shown when a test is repeated after an inter- 
val of a few months. In Table IX the midyear ' distribu- 

Table IX. The Hamge of Nttmber of ExAJapLEs Attempted 


^ 


....^on 


TololS«.g. 




Range of mMilt 


btr Bf aanipta 


IV 

V 

VI 

vn 

vin 


1-10 
1-lT 
2-14 

»-ie 
2-ie 

4-84 
3-24 
4-34 
4-18 
fr-17 


10 
IT 
IS 
14 
11 

ei 

it 
ei 

15 
13 


3 
5 

4 
6 
5 
7 
6 
8 
8 
8 


6- 6.9 

0- 8.S 

7- 7.0 

8- 9.0 
7-8,4 
2-11.0 
9-10,0 
5-18 3 
1-10 8 

9-ia.i 


8,3 
3.3 
8.3 
8.« 
2.7 
3.8 
3.1 1 

1 


For tadi grade Ue uppo Jine » for January aod the bwet iioe i, [oi May. ^ 

tions for a certain city on the addition test of Series B are 
compared with those for May. With only two exceptions 
both the total range of the scores and the range of the middle 


fifty per cei 

' TTie first t 
were given to 


t were incr 

eat was given 
ibout one hun 

L 


eased. Thi 

ust after the n 
dred and fifty 


fact shows that the in- 

lidyear promotions. T!ic testa i 
pupils in each grflda. 



AfilTHMETIC SO 

stniction in addition whicii these pupils received was more 
appropriate for the brighter pupils than for those who were 
below standard ability. Some pupils acquired abilities far 
in excess of the standard for their grade, while others re- 
mained conspicuously below the standard. This is merely 
what we should expect, because those pupils who have pro- 
fited most under our system of instruction may be expected 
to continue to profit most. Obviously, if our standards are 
wisely determined, the pupils who are below standard i 
ability need instruction and those who are conspicuously 
above standard may spend their lime more wisely upon other 
subject-matter. If this is not feasible, the methods and 
devices of instruction should be those moat appropriate to 
those pupils who are below standard. If the methods of 
instruction are unchanged, it is obvious that the pupils who 
have learned most readily will continue to do so. 

The bright pupil should receive consideration as well a 
the backward pupil. The usual class instruction does not. i 
give the bright pupil efficient training. He is not forced to 
exert himself. Much of the time he is forced to be inactive, 
Furthermore, in the case of the tool subjects (the operations 
of arithmetic, reading, spelling, handwriting and language) 
training beyond a certain point is not very profitable. 
arithmetic the bright pupil should be ^ven problem work 
rather than additional training upon the operations. 

Modifying tiie class drill. The type of class instruction -1 
described on page 55 can easily be modified so as to insure 
that the slow-working pupils will get some satisfactory drill. 
Instead of dictating only one example at a time, the teacher 
can dictate several, and stop the work as soon as a few of the 



eo EDUCATIONAI- TESTS AND MK\SUREMENTS 

faster workers have finished. The slow-working pupils will 
have some examples completed. 

The teacher must recognize that the rate at which the 
pupil performs the operations is important, as well as the 
accuracy. This means that in teaching the teacher must 
obtain a measiire of the pupil's speed, as well as a measure 
of his accuracy. If examples are dictated in groups, and the 
work stopped as suggested in the above paragraph, the 
number of examples which the pupil does during the class 
period is a measure of his rate of working. The per cent 
correct is a measure of his accuracy. 

The instruction can be made still more effective if the 
teacher will prepare a number of sets of examples, each set 
being confined to examples of the same type. These sets 
of examples should be written on cards. Then, instead of 
dictating examples, the teacher can distribute the cards 
and have the pupils copy the examples from the cards. If 
the teacher studies the needs of her pupils, it will be possible 
for her to distribute the cards so that each pupil will have the 
type of example upon which he needs j>ractice. The pupil 
is probably injured by being required to practice upon the 
wrong type of example and, hence, it is very important that 
each pupil be given the type of example upon which he 
needs practice. 

Use of practice tests. Courtis has devised a set of Stand- 
ard Practice Tests ' which automatically diagnoses each 
pupil and furnishes the practice which he needs to remedy 
his defects. These tests consist of forty-eight sets of exer- 

' Fiill details regsrding Iheae tests may be obtained from the publisb- 
en, World Book Compaay. Yoakera, New York, and Chicago, lllinui 



^ 



J 



ARITHMETIC 61 

cises, which "have been designed to cover every known 
difficulty in the development of ability in the four opera- 
tions with whole numbers." The latest form of these tests 
(1916) is arranged so that the pupils begin the series by 
taking Lesson 13, a test involving all types of examples 
found in the first twelve lessons,' All pupils who attain 
standard ability on this test are excused from the first twelve 
lessons, because they have demonstrated that they do not 
need the instruction which these lessons provide. As soon 
as a pupil who did not attain standard ability on Lesson 
13 has finished the first twelve lessons, he takes Lesson 13 
again to show that he is now up to standard. Lessons 30, 
31, and 44 are also test lessons, and are used in the same 
way. 

Each of the lessons is printed upon a card and a copy 
is furnished to each pupil. The card is placed beneath a sheet 
of transparent paper and the example is read through the 
paper, the work being done on the paper. The lessons have 
been constructed so that the standard length of time re- 
quired to complete each one is the same. They are also 
self-scoring. These two features relieve the teacher of the 
laborious work of scoring the papers, and make it possible 
for different pupils to be working upon different lessons at 
the same time. Thus, when a pupil has demonstrated that 
he is up to standard on any tjT>e of example, he may at 
once go on to the next lesson. If he is not up to standard 
on any lesson his work makes the fact obvious, and he 
can remain upon that lesson until he acquires the necessary 

• All lessons except the test lessons are confined to a single type of 
example. 



6? EDUCATIONAL TESTS AND MEASUREMENTS 

ability without interfering in the least with the work of the 
other members of the class. 

Thus, individual progress is provided for, and at the same 
time the group formation is retained. A considerable saving 
of pupil's time is effected by excusing from drill those pupils 
who demonstrate that they possess standard ability. These 
pupils can spend this time upon other work. 

These " Standard Practice Testa " also simplify instruction 
in ungraded schools. The same lessons are used for all pupils 
in grades four to eight. Only the time allowed differs. 
Thus all of the pupils in a rural school could be instructed at 
the same time and each pupil receive the practice which he 
needed. 

Another series of exercises, known as the " Studebaker 
Economy Practice Exercises," and based upon some of the 
same general principles, has been devised by J. W. Stude- 
baker, Assistant Superintendent of Schools, Des Moines, 
Iowa. They are published by Scott, Foresman & Company, 
New York and Chicago. Other series of practice exercises 
have been devised, but, so far as the wTiter has examined 
them, they are less complete and give less promise of 
efficient means of instruction. 

However, it must not be forgotten that any set of prac- 
tice exercises are merely teaching devices. It is more impor- 
tant that the teacher explicitly recognize in her thinking 
that she is instructing a group of pupils who differ widely in 
- native abihty, experience, and training, that all do not learn 
in the same way and that a limitation should be placed ujxin 
training. When she explicitly recognizes these facts, the 
resourceful teacher will find many devices which will be 



aevices wnicn wui oe i 



ARITHMETIC 63 

helpful in adapting the instruction to the needs of the 
pupils. 

QUESTIONS AND TOPICS FOR INVTSTIGATIGN 

1. Which of the first three t«sta described in this chapter would you 
select in order to secure the most helpful diagnosis of the class and oF 
the individual pupils? Why? 

i. How can the teats described in this chapter be used by the teacher 
to make her instruction more elective? 

S. Do you think pupib will welcome definite objective atondords and the 
use of standanlized testa? Why? 

4. It you are using slandardiied tista make charts showing class (or 
individuaJ) scores in comparison with the standards. Some teachers 
have found it helpful to have auch charts hung in the classroom. It is 
also helpFul to bring such charts to the attention of the patrons of the 

G. Make a chart showing how the pupils of your class compare with 
other classes of the same grade and with classes of other gnujea. 

e. Suppose a pupil is unable to do satisfactorily certain types of exam- 
ples. How would you proceed to locate his particular difficulties? If 
you are teaching arithmetic try out your plan on some of your pupils. 

7. What devices do you use to provide each pupil with the training which 
he needs? What devices are suggested in this chapter? Can you 
suggest additional ones? 

B. Pupils who are excused from drill because they do not need it should 
spend their time doing profitable things. Suggest a number of assign- 
ments which might be made to euch pupils. The asaignments may be 
in aubiects other than arithmetic if il seema wise, but they should be 
such as not ti] interfere with the instruction of the other pupils. 

0. How do you know that the methods and devices of instruction which 
you are now uaing are the beat." How could you find out? 

10, How do you know that you are not giving too much time to arith- 
metic? How could you find out? 

11. Is a class score which is conspicuously above standard a sign of 
superior teaching? Why? 

14. Construct two teatK, each being confined to a single tj-pe ot example. 
Give both tests to the same pupils under the same conditions. Com- 
pare the two seta of scores. 
13. Scientific experimentation will be necessary to determine the best 
plans of grouping pupils for instruction. These plans are worthy of & 
trial, 
a. In s building place together for drill those pupils which are 
most nearly equal ia ability as shown by the tests. 



64 EDUCATIONAL TESTS AND MEASUREMENTS 

b. Excuse from drill those who bave demonstrated UiBt ttey 
above standard. 

c. Have a special "hospital" e-lassfor those pupils 
materially below standard, A pupil's sentence to the "hospital' 
would be until he was up to Gtaadard. 



BIBLIOGRAPHY 

Only the most important references are given here. AddiUonal 
ences are given in footnotes in the chupter. 

I. Tests in the Fundamental Opei 

J. Courti) Standard Research Tesla, Series B, The testa may be secured 
from S. A. Courtis, 82 Eliot Street, Detroit, Michigan. 

Referenceb: Courtis, S. A. Tkird, Fourth, and Fifth Annual 
Accounlingi, 19IS-J916. (Department of Cooperative Research. 
82 Eliot SUeet, Detroit, Michigan.) 

Ashbaugh, E. J. The ArUhmelical Stilt of Iowa School Children. 
(Bulletin no. 21. Exteosion Division, University of Iowa.) 

Ballou, Frank W. Educational Standards arid Educational Meas- 
urements, wilk Particular Reference to Standards in the Four Funda- 
mentids in Arithmeti/:. (Sciiool Document no. 10. 1914. Boston 
Public Schools. Bulletin no. 3, Department of Educational Investi- 
gation and Measurement.) 

Haggerty, M. E. "Arithmetic: A Cofiperative Study in Educa- 
tional Measurements." (Indiana University Studies, no. 2T.) 

Haggerty, M. E. {editor). Stiidies in Arithmetic, (Indiana Uni- 
versity Studies, no. Si, vol. 3. September, 1916.) 

Monroe, Walter S. A Report of the Vie of the Courtis Standard 
Researeh T^s in Arithmetic in Twenly-fcniT Cities. (Kansas Slate 
Normal School. Emporia; Bulletin, new series, vol. 4, no. 8.) 
S, Reseamh Tests in Arithmetic, AddUioii of Fractions, designed by F. W. 
Ballou. Copies of these tests are not obl^nable. 

Referbnci;: Arithmetic. (Bulletin no. 7, Department of Educa- 
tional Investigation and Measurement, Boston.) 
S. The Cleveland Surrey Arithm^ie Tests. Copies of the test papers may 
be obtained from Charles H. Judd, School of Education, University 
of Chicago, Chicago, Illinois. 

Refehences: Judd, Charles H, Measuring the Work of the Pufdie 
Schools. (Cleveland Foundation, Survey Report, Cleveland, Ohio.) 
Also for sale by tbe Russell Sage Foundation, New York City. 

Smith, James H. "Individiial Variations in Arithmetic"; in 
Elementary School Journal, vol. 17, pp. 195-800. 
4- Stone's Arithmetie Test for the Fundamental Operations. Designed as 
a gEoeral test. Copies may be obtained from the Bureau of Publicar 
tJons, Teachers College, Columbia University, New York City. 



■ 






ARITHMETIC 



65 



Refehekce; Stone, C. W. ArCihmeiieat Abiliiiei and Some Foo- 
tOTi Delermining Tkem, (Teacbers Collie Contributions to Educa- 
tion. DO. IS.) Obtained from above address. 
5. Arilhmefie Seala dented by Clifford Woodi/. Copies may be obtained 
from tbe Bureau of Publications. Teachers College, Columbia Uoi- 
venity, New York City. 

Rbfbrence; Woody, Cliffortl, Meaauremcnti of Some Ackieve- 
menit in Arithmetic. (Teachers College Contributions to Education, 
no. 80.) Obtained from above address. 

n. Reabonino Tests 
1. Slone's Reasonijtg Tat. For copies of the test, address Bureau of 
Publications, Teachers College, Columbia University, New York 
City. 

References: Stone, C. W. ArilJimelical Abiliiies and Some Faetori 
Dittrmining Tkem, (Teachers College Contributions to Education, 
no. 19.) Obtained from above address. 

Stone, C. W. Slandardixed Reasoning Teste in Arilkiaetic and Hov) 

lo UlUixe Them. (Teachers College Contributions la Education, no. 

83.) 

S. Starek'e Arilhmeiieal Seate A. Cfipics may be obtained from Daniel 

Starch, University of Wisconsin, Madiaoa, Wisconsin. 

Refgbbncg: "A Scale for Meu.suritig Ability in Arithmetic"; in 

Journal of Edunalional Psychology. April, lOlfi. 
S, B-ackingkam' a Reasoning Test in Arilhmelic. Used by Buckingham in 
the Survey of the Gary, and the Prevocational Schools of New Y'ork 
City. 

References: Seeenleenih Annual Report of ike City Superintendent 
of SehooU, New York City, 1914-lS. 

BuckinghBni.B. R. "Notes on the Derivation of Scales, with special 
reference to Arithmetic"; in the Fifteenth Ytarbook of the Nalional 
Society for tke Study of Eduealioa, p$rl i. 



CHAPTER III 



The problem of measurement in reading. The problem 
of measurement in reading differs in certain important 
respects from that in arithmetic. An arithmetical example 
calls for a definite response from the pupil. All other re- 
sponses are incorrect. Furthermore, our arithmetical sym- 
bols and number system are such that it is comparatively 
easy for the pupil to give unmistakeable objective evidence 
of the character of his response. In the case of reading, 
particularly silent reading, the appropriate response to a 
sentence or paragraph is not so clearly defined, and it is not 
easy for the pupil to give objective evidence of the response 
which he has made. The several attempts to solve the prob- 
lem of measurement for reading are described in the fol- 
lowing pages. 



I. Silent REAnrao 

Silent reading may be classified into two main divisions, 
for each of which tests have been prepared. The jird 
deals with the ability of the pupil to read and know the 
meaning of the words he reads (a) and the second deals with 
the rate of reading and the degree of comprehension shown 
(b). The first is essentially vocabulary; the second is essen- 
tially understanding and speed. 

(a) Recognition of words. A fundamental factor of one's 
ability to read silently is the range of words whose meaning 



READING 67 

he recognizes. Three tests have been devised to measure a 
pupil's ability to associate the appropriate meanings with 
printed words. 

1. The Tkorndike Visual Vocabulary Scaler ' 

Thomdike is the author of three visual vocabulary scales : 
Scale A, Scale As, and Scale B. The latter two represent 
extensions of the former, and were derived by the same 
method. Scale Aj and Scale B are intended for use alter- 
nately or interchangeably, and so a brief description of 
Scale Ai only will lie given here. 

The scale consists of twenty-three lists of words. Each 
list contains ten words. The words in any given list possess 
about equal difficulty as to their meaning. For abihty to 
indicate the meanings of the words comprising any given 
list a certain score is given, and for ability to indicate the 
meanings of the words in the more diflScult lists a correspond- 
ingly higher score is assigned. The score values of the lists of 
words were determined by having several thousand chil- 
dren undertake to indicate the meaning of each word. The 
greater the per cent of children who could not give the 
meaning of the word, the higher the score value attached 
to the word. The method used in giving the tests can be 
made clear by quoting from the opening lines of the test 
sheet: — 

Write the letter F under every word that means a flower. 

Write the letter A under every word that means an animal. 

Write the letter N tinder every word that means a boy's name. 

Write the letter G under every word tliat means a game. 

' Thoradike, E. L. "The Menauretnent of Ability to Read"; in Teach- 
eri CnlUge Record, September. 1914, for Scale A. Teaclurt CtMtgt Re- 
card, November, 1910, for Scale Ai and Scale B. 



68 EDUCATIONAL TESTS AND MEASUREMENTS 

And so on for four additional meanings. Then the twenty- 
three lists of words follow, with the score value of each list 
given on the left-hand margin. 
Three representative lists are reproduced below: — 

Value 4. duisy came] samuel rabbit monkey william 

tulip goat paul violet 
Value 7. constantly sincere chess weak antelope eugene 

henceforth Julian formerly candy-tutt 
Value 10. phlox set dependable judicious caribou orchid 

nitlilesa conpassionate cynis petunia 

Use and standards. The author of the scale recommends 
that, in using it, the examiner should not rate the child at 
the highest point in the scale, where he knows all the words 
in the list, but rather at that point where he knows only 
eight out of ten, A table of values is provided by which to 
infer the proper score where the child misses either more or 
less than two words in the List which best measures his 
acquaintance with words. 

Complete instructions for giving and scoring the test, 
preliminary sheets, and scoring sheets are provided with the 
tests. Permission is also granted to any one wishing to 
use the tests to print them for himself, in which case it is 
recommended that brief test sheets be made up from the 
scales restricting the lists used with each grade of children 
to a few values. Thus much time will be saved, both for 
children and teacher, by eliminating the difficult lists for the 
younger children and the easy lists for tlie older children. 

No standards have as yet been derived by the use of the 
Thorndike Scale Aj or Scale B with large numbers of public 
school children. In Table X the standards of achievement 




READING 69 

by the use of the Thomdike Scale A (with which the values 
on scales Aj and B are supposed to be identical) are given, 
and serve as tentative standards for purposes of comparison. 
The score values were obtained by the measurement of the 
pupils in eighteen cities in Indiana.' 





J 






IV 


y 


VI 


VII 


.,,4 




4.00 

1650 


5,26 
2085 


6.00 

was 


6. OS 
1860 


7, 2D 

leas 













y 



S. The Haggerly Visual VocabulaTy Tests 
The tests prepared by Haggerty, of the Bureau of CotJper- 
atjve Research of the School of Education of the Univer- 
sity of Minnesota, are but a slight modification of the Thom- 
dike scales described above, with the addition of an oral 
test for children of Grades I and II. This test will be de- 
scribed under the heading of " Oral Reading." 

Scale R2, of which there is one sheet for children of Grades 
III and" IV, and another sheet containing part of the same 
words and additional more difficult words for Grades V, 
VI, VII, and VIII, are devised in exactly the same way as the 
Thomdike scales. Methods of scoring are somewhat more 
simple, and the lists are more brief than those used by Thorn- 
dike. Standards are being worked out in the Minnesota 
schools for each grade, but as yet none have been annoimced, 
' Haggerty. M. E. The Ability to lUad: Us Meaiurenieiit and Some Facta 
Conditijining It. (Indiana University Studies, no. Si.) 




70 EDUCATIONAL TESTS AND MEASUREMENTS 



S. Starch's English Vocabulary Tests^ 
These tests are lists of one hundred words, each selected 
at random from a dictionary. The child is asked to check 
the words of the meaning of which he is certain, and to write 
the meaning after the words where he is in doubt. The child's 
score is the per cent of words thus checked or correctly 
defined. 

To illustrate the Starch vocabulary lists, the first one of 
his two is reproduced below. 



Stabch'b English Vocabulabt Test-List ] 



1. acta 


24. currency 


47 


to interlay 


8. abnormal 


25. death 


48 


lUlianate 


3. agriculturist 


96. departmental 


49 


Jupiter 




87. difference 


50 


knowledgeable 


5. Araneida 


28. displayed 


61 


Latin 


Q. BSaaRia 


29. todow 


52 


lewis 


r. awaft 


30. dysodile 


63 




8. barker 


31. eloquence 


54 


Lycoperdon 


9. belleric 


32. epicine 


55 


mange 


10. bizarre 


33. evaporative 


56 


mayonnaise 


11. bonmot 


84. faction 


57 


mesotasis 


18. drible 


35. to flat 


58 


miscue 


IS. butler-cup 


36. forest 


59 


moon 


14. canon 


37. fubby 


60 


musk 


16. Catananche 


38. to gazette 


61. 


neo volcanic 


16. chancroid 


39. gloniou 


62. 


tonotate 


17. to chop 


40. gyral 


63. 


off shore 


18. clearness 


41. hautboy 


C4. 


organdie 


19. collar 


42. heterogony 


65. 


owlet 


20. to comprobate 


43. hordeaceoiis 


66. 


parallel 


21. constructiveness 


44. hyperkeratoaia 


67. 


to peal 


22. to cree 


45. to implore 


68. 


personable 


23. correal 


46. to infatuate 


69. 


to piece 




Starch, Daniel, EducatioTiai HeaauremtnU, p. 38. 





r 

READING 


1 
71 


70. Pluerotoma 




92. tipburn ^H 


71. portrait 


82. ses 


93. to traoafer ^^H 


72. prevailing 


83. sigmoid 


94. to trump ^^H 


73. p roved itor 


84. to sluice 


OS. un beseem ^^^ 


74. quadruple 


85. spadrooQ 


96. upholster i 


75. rapt 


86. spur 


97. vernier 


76. reformer 


87. stipulator 


98. waldgrave 


77. respectful 


88. subregion 


09. wharf 


78. river 


89. sweet 


100. zelotypia 


79. rutter 


90. tarsTis 




80. sawmill 


91. Theatin 





(b) Tests for comprehension and speed. There are three 
types of tests to measure the ability of pupils to read si- 
lently sentences or series of connected sentences. First, 
tests for the understanding of sentences or paragraphs, 
without regard to the time required for that understanding. 
In this class are the Thomdike Scale Alpha, and the Minne- 
sota Scale Beta. Second, tests which measure separately 
the speed of reading and the amount of comprehension. In 
this class are the Courtis English Tests, Brown's Silent 
Beading Test. Starch's Silent Reading Tests, and Gray's 
Silent Heading Tests. Third, tests which combine the fac- 
tors of speed and comprehension in a single mark or score. 
In thb class are the Kansas Silent Beading Tests and the 
Courtis Silent Beading Tests. 

1. The Tkomdike Scale Alpha ^ 
This scale is to measure the understanding of sentences. 
It consists of a series of groups of sentences which the child 
is to read. These groups of sentences or paragraphs repre- 
sent increasing degrees of difficulty, the extent of which has 

' Thomdike, E. L. "AalmprovedScaleforMeaauringAbllity inReud- 
ing"; in Teaeken Coihge Beeord, November, lOlS, and January, 1Q16. 




r 



72 EDUCATIONAL TESTS AND MEASUREMENTS 

been determined experimentally. This difficulty is expressed 
by an appropriate number opposite each paragraph. Each 
paragraph is followed by a number of questions which teat 
the child's understanding of what he has read. The ele- 
ment of speed of reading is disregarded, the child being 
allowed all the time he requires. The following set of 
sentences and questions upon them are copied from the 
scale, and illustrate the sort of ability called for. 

Set C (Score Value, 8) 

Read this and then write the answers. Read it again as often as 
you need to. 

It may seem at first thought that every boy and girl who goes 
to school ought to do all the work that the teacher wishes done. 
But sometimes other duties prevent even the best boy or girl from 
doing so. If a boy's or girl's father died and he had to work after- 
noons and evenings to earn money to help his mother, such might 
be the case. A good girl might let her lessons go undone in order to 
help her mother by taking care of the baby. 

1. What are some conditions that might make even the best 
boy leave school work unfinished? 

3. What might a boy do in the evenings to help his family? 

S, How could a girl be of use to her mother? 

4. Look at these words: idle, tribe, inch, it, ice, ivy, tide, true, lip, 
top, tit, tat, toe. 

Cross out every one of them that has an t and has not any t 
(T) in it. 

A key for scoring b provided by which the examiner is 
guided in deciding which answers to score right and which 
to score wrong. The child's score is the score value of the 
most difficult paragraph concerning which be can answer 
eighty per cent or more of the questions. The class score is 



1 



J 



READING 

the score value of the most difficult paragraph concern- 
ing which the per cent of correct answers made by all the 
class is nearest eighty. A table of values is provided for 
interpolating where the per cent is somewhat above or be- 
low eighty. Table XI gives tiie median scores for the pupils 
in eighteen cities in Indiana as reported by Haggerty. 



Table XI. Median Scobes in Undebatandinq o 

BY THE THOHNDIKB ScAIj; AlPHA 


V Sektencbs 




G^ 




III 


IV 


y 


VI 


Vtl 


vm 




5.48 
1850 


6.56 

M95 


7.66 

eoea 


8.« 
1860 


8,72 
1625 













3. The Minnesafa Scale Beta 

The Bureau of Cooperative Research of the School of 
Education of the University of Minnesota has had printed 
a slightly modified form of the Thomdike Scale Alpha under 
the name Scale Beta. One sheet is prepared for Grades III 
to V, and another sheet for Grades VI to IX. Certain exer- 
cises are common to both sheets, but the most difficult 
ones are omitted for the younger children, and the least 
difficult ones are omitted for the older children. The 
method of scoring is the same as used with the Thomdike 
Scale Alpha. 

3. The Courtis English Tests ' 

As an example of one of the first attempts made to measure 
speed of reading separately from comprehension, mention 

' Courtis. S. A. Manual of Standard Tents; also. The FourUenik Year^ 
boot of the National Society for the Study of Education, parti, pp, 44 to AS. 



1 

I 

i 

1 

i 

4 





74 EDUCATIONAL TESTS AND MEASUREMENTS 

should be made of the Courtis English Tests, a part of which 
involves measuring the number of words read per minute 
when the child knows that he is going to be asked to repro- 
duce the meaning, and also the number of words read F>er 
minute when no reproduction is to be called for. The test 
also provides for a very careful measure of comprehension 
as determined by the amount of the meaning which the 
child can reproduce in composition form. The task involved 
in giving the tests and scoring the papers ia so laborious, 
however, as practically to nullify the virtues of the tests 
because none but the most enthusiastic investigators are 
willing to undertake to use them. Their chief service has 
been as a type after which other much more simplified 
tests have been modeled. 

^, Brown's Sileni Reading Test ' 
This test consists of a very interesting reading selection. 
The directions require that the children being tested read 
the selection silently exactly one minute, then draw a line 
around the word which they have reached when the exam- 
iner calls "stop." The number of words read makes the 
score in speed. 

The children are then asked to write as much as they can 
remember of what they have read. A key is provided for 
the examiner to use in scoring the papers. On the key is 
listed all the separate ideas contained in the selection. By 
comparing the child's papers with the key, the examiner 

ra. H. A. The Meninremenl of Ability lo Bfad. (Bulletin no. 1. 
Bureau of Research, State Department ot Public InaUuction, Concord, 
New Uampshire.) 





BEADmG 



7a 



determines first how many different points there are in 
what the child read. Then his reproduction ia examined 
carefully to determine (1) quantity and (2) quality of com-J 
prehension. Quoting from the J/anuo/, page 15: — I 



A good method of procedure in scoring the papers is to place the 
key and the child'a reproduction aide by side on a table. The first 
point in the key should be examined and then the child'a repro- 
duction should be carefully scrutinized to see if it contains the 
idea. It ahould be borne in mind that the child's language may be 
entirely different from the wording of the idea in the key. The 
purpose of the one scoring the paper should be to see if that idea 
is ejcpresscd in the child's own language or sufficiently plainly 
implied by what he haa written with sufficient completeness and 
accuracy to be credited for quantity. If so it should be scored tor 
quantity. Each point should be considered in the same manner. 
Each idea should nest be examined with reference to quality and 
scored if it meets the requirementa. When the entire paper 
has l)een scored, the number of ideas scored for quantity and the 
nutnlwr for quality should be counted and each expressed as a per- 
centage of the total number in the portion of the selection read 
by the particular child in question. The average of quantity and 
quality should be taken as a measure of the child's comprehension. 

The scorer must use hia best judgment in determining whether to 
give credit tor an idea. Everything which the child says in any way 
related to the idea in question should be considered, and if it seems 
from his language that he has the idea with the required degree of 
completeness and correctness in the case of either quantity or 
quality, it should be credited. 

The record sheet calls for the reduction to one mark, 
termed "reading efficiency," of the two marks of speed and 
comprehension by multiplying them together. Thus, if a 
child reads 1.5 words per second, and the average of his 
quantity and quality of comprehension is 64, his reading 
efficiency is 1.5 times 64, or 96, 

Value of the test in diagnosis. For purposes of diagnos 



76 EDUCATIONAL TESTS AND MEASUREMENTS 

thia teat haa marked advantagea. Not only can the teacher 
determine the standing of his class in general reading effi- 
ciency, but the arrangement of the data on his class record 
sheet enables him also to see at a glance whether his class 
or any individual child in it is up to standard in speed, in 
quantity, or in quality of comprehension. Several equiva- 
lent selections have been prepared so that it is possible to 
use new material when the test is repeated. 

Brown proposes as tentative standards, for purposes of 
comparison, the average scores made by the best class yet 
tested in each school grade. These averages are given in 
Table XII. 



Table XII, Tentative Scores with i 
Reai>inq Test 



ii Brown Silent 





K^ o/riadina 


CmjJreflm- 




Grade III 


3.38 words per aecoad 


la 


187.8 


Grade IV 


3.55 


65 


217.1 


GradeV 


4.10 ' 


SI 


291.0 


Grade VI 


1.54 " 


68 


296.0 


Grade VII 


1.05 " 


B7 


329. S 


Grade VHI 


l.Bl ' 


79 


325.6 



5. Starch's Silent Reading Tests ^ 

These testa do not differ essentially from Brown's test, 

described above. Instead of using one selection for all ages 

of children, the Starch testa consist of a different selection 



' Starch, Daniel. "The Measurement trf Efficiency in Beading"; in 
Journal of Edacaiional Psychology, Joliuary, ISIS. Abo in his Ediltalionai 
Mewmremenig, pp. 22-30. 




for each grade in the elementary school. The following' 
one, which is used for fifth grade, will illustrate: — 

No. 5 

Once upon a. time, there lived a very rich man, and a king b^ 
sides, whose name was Midas; and he had a little daughter, whom 
uobody but myselE ever heard of, and whose name I either never 
knew, or have entirely forgotten. So. because I love odd names for 
little girls, I chose to call her Marygold. 

This King Midas wob fonder of gold than anything else in the 
world. He valued hia royal crown chiefly because it was composed 
of that precious metal. If he loved anything better, or half so well, 
it was the one little maiden who played so merrily around her 
father's footstool. But the more Midas loved his daughter, the 
more did he desire and seek for wealth. He thought, foolish man! 
that the best thing he could possibly do tor bis dear child would be 
to give her the immensest pile of yellow, glistening coin, that had 
ever been heaped together since the world was made. Thus, he 
gave all his thoughts and all his time to this one purpose. If ever 
he happened to gaze for an instant at the gold-tinted clouds of 
sunset, he wished that they were real gold, and that they could be 
squeezed safely into his strong box. When little Marygold ran to 
meet him witii a bunch of buttercups and dandelions, he used to 
say, "Foh. poh, child! If these flowers were as golden as they look, 
they would be worth the plucking!" 

And yet, in his earlier days, before he was so entirely possessed 
of this insane desire for riches. King Midas had shown a great 
taste for flowers. 

The children being tested are allowed just thirty seconds 
to read as much as they can. WTnen the time ia up they make 
a mark after the last word read, and then turn the sheet 
over and reproduce as much as possible of what they have 
read. It is recommended that the test be repeated the suc- 
ceeding day, using the selection designed for the grade below 
the one the child is in. 

The average number of words read per second in the two 



1 



78 EDUCATIONAL TESK AND MEASUREMENTS 

tests makes the child's score in speed. For the score in com- 
prehension, the child's reproduction is carefully examined, 
and all words crossed out which do not represent ideas 
contained in the test passage, or which repeat ideas recorded. 
The remaining words are counted and the average of the 
numbers of words in the two reproductions is used as the 
score in comprehension. 

The author of the tests recommends that when large classes 
of children are being tested, and the results are not intended 
for use with mdividuals, the reproductions be not examined 
but that the whole number of words used be counted and 
reduced by seven per cent for errors. When thus treated the 
tests become wholly objective. 

The median scores made by six thousand children in 
twenty-seven schools are given in Table XIII, 



Or^^ory^r. 


1 


2 


s 


^ 


^ 


« 


r 


« 


Speed ot read- 


















ing (words 


















per second) . 


1.5 


1.8 


2.1 


2.1 


2,8 


3.2 


3.6 


4,0 


Cotapreheo- 


















sion (words 


















written) . - , . 


15 


20 


24 


28 


33 


38 


45 


50 



6. Gray's Silent Reading Tests ^ 

These tests consist of three selections, one for Grades II 

and III, one for Grades TV, V, and VI, and another for 

' Gray, W. S. "Tests of Silent Reading"; in Charles H. Jiidd. Afeaaur- 
ingthe Work of the PuUic Schoah. pp. 273-81. (The Cleveland Foundation 
Survey, 1910.) For other reterences see Bibliography at end of chapter. 



Grades VII and VHI. The selections are so arranged on ' 
the pages tliat the time required to read one hundred 
words can be readily ascertained. Only one child is tested 
at a time. After completing the reading, the child, if in the 
second or third grade, tells the story to the examiner who 
writes it down. If in the grades above the third, the child 
writes all he can remember of the story, and then writes an- 
swers to a set of questions which is furnished him by the 
examiner. The child's score for quality of reading is assigned 
on the basis of two factors, reproduction and accuracy. 
Reproduction is determined by the number of words which 
remain in the child's composition, after all wrong or irrele- 
vant statements and repetitions are stricken out. Accuracy 
is determined on the basis of ten points for each correct 
answer. The quality mark is the average of these two. Thus 
a score for rate of reading and a score for quality of reading ] 
is ascertained for each pupil. Standards are given in Table 1 
XIV. 1 

Table XIV. Standarh Scores for Ghat's Silent Rkadino ' 

Tests j 


Orade 


s 


s 


^ 


* 


« 


r 


a \ 


Bate (words 

peroecond). 
Q>iality 


1.50 


2.30 

37 




a. 57 

St 


a. 79 

39 


2,69 


2. 87 
27 


7. The Kamas Silent Reading Tests ' ' 
The Kansas Silent Reading Tests take both speed and 
accuracy of comprehension into account in a single mark. 
> Kelly. F. J. "The Kansas Silent Reading Testa"; >q Journal of Bdtf 



80 EDUCATIONAL TESTS AND MEASUREMENTS 

Each test — No. 1, for Grades III, IV, and V, No. «, for" 
Grades VI, VII, and VIII; and No. 3 for the hifih school 
— consists of sixteen esercises, of which the following, 
intended for Grades VI, VII and VIII, will serve as illus- 
trations : — 



No. 6. 



No. 14. 
Value, 



The air near the ceilinK of a room is warm, while 
that on the floor is cold. Two boys are in the room, 
James on the floor and Harry on a box eight feet high. 
Which boy has the warmer place? , . . 

In going to school. Jamea has to pass John's house, 
but does not pass Frank's. If Harry goes to school 
with James, whose house will Harry pass, John's or 
Frank's? . . . 

A list of words is given below. One of them is needed 
to complete the thought in the following sentence: 
The roads became muddy when the snow. . . . Do 
not put the missing word in the blank space left in the 
sentence, but put a cross below the word in the list 
which is nest above the word needed in the sentence- 
water 

melted 



The supposition is that the child's comprehension !s 

tested by each exercise. And since he does not pass on to the 

next exercise until he has indicated his comprehension of 

each one, the sum of the values attached to the esercises 

which he can complete correctly in the five minutes allowed 

is thought to give a fair measure of his speed of reading 

combined with his ability to comprehend the meaning of 

the esercises. 

cationa! Psychology, February, lUia; also, Bulletin do 3, Bureau of Edu' 
tional MeosuremenU and Standards, State Normal School, Emporia, 

Kansas. 



READING 81 

Score values for flie Kansas tests. The score to be givea 
for the correct doing of each exercise is indicated in the 
margio to the left ot the exercise. This value was determined 
by the length of time children of a given grade did actually 
require on the average to do each exercise correctly. 

Standard Median Scores. These tests have been used 
widely and very reliable standards have l>een established 
for the several grades. In Tabic XV the median score for 
each grade is given, and also the twenty-five percentile, or 
that mark below which the poorest one fourth of the chil- 
dren of the grade fall, and the seventy-five percentile, or^ ^h 
that mark above which the best one fourth of the chUdi«'t^| 
of the grade fall. ^| 

Table XV. Median Scobeb, Kansas Silent Readino Tbotb 

{Bated upon more than 100,000 latrei) l 




Gr^di 


-T," 


Gr,.dt 


arndt 


VII 


vns 


'tf 


CmJt 


.IV ' 


» 


■MuMen... 


ft.8 


S.B 


U.B 


13.B 


u.a 


IS. 3 

»l.4 


aa.fl 

30.4 


3B.6 


M.6 


3S.7 


8. The Courtis Silent Reading Tests » 
These tests are of the same general type as the Kansas 
Silent Reading Tests. In them, however, the several exer- 
cises make a connected story. The response called for in 
all the exercises is the same, namely, to draw a line around 

> Courtis, S. A. Standard Research Teili in Silent Reading. 8i Eliot 
t Street. Detroit, Michigaa. 



82 EDUCATIONAL TESTS AND MEASUREMENTS 

a word. PurtlieriBore, the exercises are selected with a view 
to their equality in respect to difficulty, and so no attempt 
has been made to assign values to them. The score of the 
child taking the lest is the number of the exercises which he 
can complete correctly in the time allowed. 

No standards have yet been obtained by the use of these 
tests. 

II. Ora.l Reading 

Silent reading represents one part, and quite a large part 

of a child's ability to use reading as a tool. Oral reading 

represents another part, and for this abo a number of tests 
have been devised. . 

i 

1. The Jones Visual Vocabulary Tests ' H 

Jones sought a means of measuring the vocabulary of 
primary grade children. Selecting ten of the most widely used 
primers, he found the frequency of occurrence in all the 
primers of each word occurring in any of them. He used this 
frequency as a measure of the value of each word. Thus a 
word occurring one hundred times would count twice as 
much in a child's score as a word occurring only fifty times. 
Using the values thus determined for each word, lists of words 
were made up as tests, and a chOd's score in the test is the 
sum of the values attached to the words which he can pro- 
nounce correctly. 

' Jonea, R. G. "Standard Vocabulary (or Primary Grades"; in The 
Fovrleitilh Yearbook of Ike National Society Jot Iht Stud]/ of Educalion, 



_ 



£. The Haggerty Visual Vocabulary Tests ^ 
For Grades I and II these consist of two sheets, one of 
sight words and the other of phonetic words selected from 
the Jones test. The words on either sheet are grouped into 
lists according to difficulty. This difficulty was determined 
by trial with several hundred primary children. Values 
are attached to each word according to its ascertained diffi- 
culty. The child being tested is asked to pronounce the 
words aloud, and his score is the value attached to the most 
difficult list of which he can pronounce four out of five 
words correctly. 



S. Gray's Oral Reading Test ' 
Nature of the tests. This test consists of eleven para- 
graphs, arranged in order of increasing difficulty. The 
relative difficulties have been established experimentally. 
Below are reproduced the first, sixth, and eleventh para- 
graphs, as illustrations. 

1. A boy had a dog. 

The dog ran into the woods. 
The boy ran after the dog. 
He wanted the dofi to go home. 
But the dog would not go home. 
The little boy said, 

"I cannot go home without my dog." 
Then the boy began to cry. 

' Haggerty, M, E. "Scales for Reading Vocabulary of PcimaryChil- I 
dten"; in The EUmeidary School Jovrnal, vol. 17, no. 2. (October, 1918.) • 

• Gray. W. S, "Oral Reading Test"; in Charles H. Judd, Jtfeoiurinj I 
the Work of the PvlAie Schooli, pp. ae3-7B. (Cleveland Foundation Survey, 
ISIO.) See Bibliography at end of chapter for additioaal refereoceB. 



84 EDUCATIONAL TESTS .\ND MEASUREMENTS 

6. It was one of those wonderful evenings such as are found 
only in this magnificent region. The sun had sunk behind 
tike mountains, but it was still lighL The pretty twilight glow 
embraced a. third of the sky, and against its brilliancy stood 
the dull white masses of the mountains in evident contrast. 

II. The hypotheses concerning physical phenomena fonnulated 
by the early philosophers proved to be inconsistent and in 
general not universally applicable. Before relatively accu- 
rate principles could be established, physicists, mathemati- 
cians, and statisticians had to combine forces and work 
arduously. 

How the test is scored. The plan of administering the 
test is rather complicated. The child's oral reading of each 
paragraph is checked for time, and for each of six types of 
errors; namely, gross errors, minor errors, omissions, sub- 
stitutions, insertions, and repetitions. These errors are de- 
fined io the manual. The credit given for reading any para- 
graph varies inversely with the time and inversely with the 
number of errors. For example, certain credit is given a 
second-grade child for reading paragraph 1 in forty seconds 
with less than five errors, and additional credit is given 
the same child for reading the same paragraph in thirty 
seconds with less than five errors, or in forty seconds with 
less than four errors. Still difl^erent credit is given to third- 
grade children for each of the above achievements with 
paragraph I. When the combination of length of time and 
number of errors exceeds a certain prescribed maximum, 
no credit is allowed. The score of any child is ascertained 
by adding together all the credits which he has earned on 
the several paragraphs. This process becomes much more 
simple than it sounds here when the blanks for recording 



the detailed data for each child and for tabulating results 
are at hand. 

III. An Estimate of the Valttb of the Several 
Reading Tests 

Before passing to a discussion of the chief uses of the read- 
ing tests described above, it may be well to point out some 
of their distinguishing virtues and at the same time some of 
their limitations. Let it be remembered at the outset that 
all standard test work in the field of education is still in the ■ 
pioneer stage, and that most authors of tests, if not all, 
realize more keenly than their critics how far short of per- 
fect instruments these tests now are. 

Criteria for estimating values. As a basis for judging the 
usefulness of the present available reading tests, four of the 
commonly accepted criteria for determining satisfactory 
educational tests in any subject are here given. 

(1) The test must be objective. That is to say, there 
must be room for a minimum of opinion when rating the 
results of the test. A perfect test in this regard would be one 
which when answered by any child would be rated alike by 
all competent judges. Perfect objectivity probably cannot be 
hoped for in reading tests, but it is one of the essential things 
to be sought in a test and it is worth making considerable 
of a sacrifice to approach. Without objectivity it is quite 
impossible to use a test to compare the standards of achieve- 
ment from school to school unless the papers from all the 
schools are rated by the same judges. Since this is not prac- 
ticable, and since one of the chief uses of standardized tests 
is to make possible comparisons among schools, objectivity 



86 EDUCATIONAL TESTS AND MEASUREMENTS 

comes to be of the Brst importance as an essential of a satis- 
factory test. 

(2) The test must be arranged in steps along a scale whose 
units are equal. The difference in amount between a score ot 
six and a score of eight must be equal to the difference in 
amount between a score of thirteen and a score of fifteen. 
The whole principle of comparative measures depends upon 
the equality of the steps all along the scale. 

(3) The test should measure the achievement or ability 
which it purports to measure. No silent reading test meets 
this requirement, for example, unless the children who get 
the best scores in it are the ones who are the most efficient 
silent readers in their regular work, whether in school or 
out of school. No spelling test meets this requirement un- 
less the children who get the best scores in it are those whose 
regular written work is most nearly free from misspelled 
words. 

(4) The test must be brief and simple if it is intended for 
use by public school teachers and superintendents. To carry 
conviction as to its validity and importance, a test must be 
easily comprehended by the teacher, and the directions for 
giving the test and tabulating the results must not be too 
involved. Since the test is not an instructional device, but 
an instrument for measuring the resiilts of instruction, it 
cannot call for any large amount of time from either the 
teacher or the pupils. 

These criteria applied. W^en examined in the light of these 
four essential criteria the strong and weak points in the sev- 
eral tests described above stand out very clearly. Both the 
Thorndike and the Haggerty visual vocabulary tests are 



READING 87 

almost perfectly objective, and the equality of the steps 
along the scale is assured by the method which was employed 
in deriving the tests. As to whether these tests measure 
only what they purport to measure, namely visual vocabu- 
lary, a question has been raised ^ but has not been answered. 
In reading, visual vocabulary is used not with words in iso- 
lation, but with words in their setting in sentences. How 
nearly a score obtained by a test using the words in isolation 
is a, measure of the ability to recognize those same words in 
sentences is not known. The Jones tests are made up in both 
ways, one listing the words in isolation, and the other using 
the words in sentences, in recognition of this uncertainty. 
It may be doubted, too, whether a child should be expected 
to recognize such words as " eugene " and "jesse" when 
they are not capitalized. Finally, these vocabulary tests 
are simple and can be made brief, but the method of obtain- 
ing the class average on the record sheet is rather tedious, 
and will probably tend to limit materially the usefulness of 
the tests. 

Turning to the tests for silent reading we find most of them 
far from perfect in objectivity. For scoring the papers in the 
Thomdike Scale Alpha, the Minnesota Scale Beta, and 
Brown's Silent Reading Test, elaborate keys are provided 
for the guidance of the scorer, while with Starch's tests and 
Gray's tests the scorer must use his judgment as to the cor- 
rectness of each unit of the child's reproduction. With as 
large aaubjectivefactor left in the scoring as these tests allow 

' Jones. H. G. "Standard ViX'abulary tor PrimBry Gradea"; in Four- 
leentk I'earboolr 0/ the National Society for the Study of Education, part I, 



88 EDUCATIONAL TESTS AND MEASUREMENTS 

it is not safe to base too much upon a comparison of the re- 
sults in one class with those in another, unless the papers in 
both cases are scored by the same person, or by persons 
trained for the special task. While this is true, these tests 
are nevertheless far more objective than the examina- 
tions in common use, and in the hands of a trained investi- 
gator their imperfect objectivity is not a serious fault. 

Considered from the standpoint of what they measure 
these tests all present normal reading situations, and noth- 
ing in our whole educational endeavor is so important, so 
far as formal instruction goes, as to be able to determine 
how rapidly, and with what degree of comprehension, school 
children can read such selections as are presented. It may 
be questioned, however, whether a child's ability to reproduce 
in composition form the ideas he has gathered from the se- 
lection is not an abiUty quite separate from the ability to 
comprehend the meaning. This criticism holds but little 
against the Thomdike Scale, but must be considered seri- 
ously in connection with the tests of Brown, Starch, and 
Gray. It is to avoid this doubtful method of procedure that 
the Kansas Silent Reading Tests go to such a length as they 
do to reduce to a minimum the element of reproduction. 

It may be said in this connection, however, that in avoid- 
ing much reproduction, the Kansas Silent Reading Tests 
have incorporated within themselves another fault, the seri- 
ousness of which has yet to be calculated. This fault is that 
the exercises thus made up so as to call for a minimum of 
reproduction partake somewhat of the nature of puzzles, 
and therefore do not represent normal reading difficulties. 

One other consideration must be kept in mind in coonec- 



"I 



BEADING 89 

tion with the tests of Brown and Starch. The time involved 
in scoring the papers is considerable. Because of the uncer- 
tainty of oniformily of scoring in the hands of untrained 
persons, teachers cannot do the marking of the papers very 
satisfactorily, and so these tests seem to be useful mainly in 
special investigations where tlie material is to be handled 
by experts. The same conclusion also seems to hold with 
reference to Gray's Silent Reading Tests, and for the ad- 
ditional reason that the time required to test a class, one 
child at a time, is so great as to prevent general use of the 
testa by teachers. 

The Kansas Silent Reading Tests and the Courtis Silent 
Reading Tests are designed to be put into the hands of the 
class teacher. They are very simple to administer, take but 
little time, and are objective. The serious question about 
them was pointed out above. 

The Jones Vocabulary Tests are deficient In respect to the 
vahies attaching to the various words. The fact that " the " 
occurs 1733 times in ten primers, wliile "that" occurs 176 
times, is scarcely a sufficient reason for scoring "the" at 
ten times the value of "that." 

Gray's Oral Reading Test is a splendid instrument, but it 
is serviceable only in the hands of one carefully trained in 
its use. The method of testing one pupil at a time, which 
oral reading seems to require, makes the element of time a 
serious consideration at best, and this test which depends 
upon such elaborate marking of each paragraph read is quite 
burdensome. The test is objective, however, and the results 
obtained are very reliable measures of oral reading. 

When teachers shall have acquired more generally in their 



90 EDUCATIONAL TESTS AND MEASUREMENTS 
professional equipment a knowledge of standardized testa 
and a facility in their use, many of the above criticisms 
will be rendered invalid. _ 

IV. The Service of Reading Tests ^R 

Service to the superintendent. From the importance or ' 
reading in the general efficiency of all school work we may 
assume that the superintendent is vitally interested in 
making the instruction in reading most effective. What can 
reading tests reveal to him? 

First, they can satisfy hira and his teachers of the general 
status of reading in his district. It is easy for any superin- 
tendent to carry conviction among his teachers that the 
results in reading are not satisfactory in his district if he can 
show that among a group of a dozen or more neighboring 
cities his district stands low. The extent to which it stands 
low becomes a measure of the renewed earnestness needed 
in attacking the problem of improvement. 

It is difficult for one to carry in mind a fixed standard of 
achievement. One gradually thinks more and more in terms 
of what those aroimd him are achieving. It would have been 
quite impossible, for example, to convince the superintend- 
ent and teachers of the Anglo-Korean school at Songdo,' 
without a standardized test, that the children in their fifth 
grade could do, on the average, reading work valued at only 
3.8 units, while American children who had been in school 
only the same mmiber of years could score 13.2 units, or that 

' Wflsson, Alfrfid W. " Rpport of an Experiment in the Use of Ihe Kansas 
Sileat Beading Teats with Korean atudenls"; in Edaealional Adminitira' 
(ion and Supermion, vol. 3, p. 88. 



J 




HEADING 



their sixth grade could accomplish only as much as the 
American third grade. It meant much to that school for 
its superintendent and teachers to be able to measure their 
school by the American standards. 

As an illustration of this divergence of standards among 
schools of different types in this country, and among schools ■ 
in different sections of the country, Table XVI is given. - 
Test I was used in Grades III, IV, and V, in city schools, 
but in Grades III, IV, V, and VI in country schools because 
the elementary-school course is divided into nine grades in 
the country schools. Test II was used in the upper grades . 
in both city and country. 

The determination of status must extend beyond a gen- j 
eral measure of reading ability. What is developed under 1 
the name of reading is in fact a complex of many abilities. 
This was most strikingly brought out in Cleveland, Ohio, by 
the recent siirvey which disclosed that while the city was 
uniformly high both in oral reading (pronouncing the suc- 
cessive words of a paragraph) and in speed in silent reading, 
it was uniformly low in the quality of reading as evidenced 
by the ability of the children to reproduce what they had 
read. Evidently the end which Cleveland was seeking to 
accomplish in reading was different from the end sought in 
the group of cities with which her work was compared, 
(See Fig. 23, page 297.) Be it remembered that the tests 
do not disclose which is right, Cleveland or the other cities, 
but they do disclose the difference, and in the difference lies m 
a problem which could not have been intelligently attacked I 
without a knowledge of the facts revealed by the tests. 

Reveals wrong emphasis in teaching. Differences in t 



^ 



I 




OS IDHCATIONAL TESTS AND MEASCEEMENTS 







Gradi 




in 


IV 


" 


VI 


Vll 


Vlll 


IS 


Firat-elaas cities 


Median 
Number of 
children 


4.3 

1873 


8.8 
8017 


13.1 

1810 


13.8 
1590 


10. 1 
1546 


19,7 
1384 




cities in Kansas 


Median 
Number of 
chUdien 


5.9 
968 


0.7 
1067 


14.3 
994 


14.3 
1024 


17.8 
813 


20.6 
596 




Third-clttBS citiea 
in Kansas 


Median 
Number of 
children 


1.8 

373 


8.e 

624 


11.8 
471 


12,5 
518 


14.0 

852 


20. Q 

560 




Kansas total... . 


Median 


4.9 


9.0 


13.4 


13.7 


16.1 


20.1 


m 


Iowa total 


Median 
Nmnber of 
children 


e.2 

2371 


0.5 
2940 


14.8 
2695 


14.8- 
2S97 


17.7 
2143 


20.6 
1819 


\ 


Total From Far- 
Western cities. 


Median 
Number of 
chJdren 


e.i 

228« 


10.6 

asoo 


14.4 

«643 


15.0 

2673 


18.0 
2508 


20.6 
2075 




Thirty-five one- 
room schools 


Median 


3.0 


7.0 


8,7 


11,8 


11.0 


15.0 


18.9 


Cities in the 
Soutbem 
States 


Median 
Number of 
children 


4.7 
6SS 


8.4 

723 


12.3 

702 


11,8 
602 


15,4 
498 


19,2 
350 





reading work done in the several buildings within a city may 
be aa striking as differences among citiea. In a certain Middle 
Western town a forceful principal of one of the ward build- 
ings has dominated the wprk of the building for a good many 
years. The reading of the building was his particular pride. 
When tested for silent reading ability his children scored in 




READING 



every grade but little more than half what the children in 
another building scored where the work was reputed to be 
"much less thorough." These results were made the basis 
of dehberatioDS among the teachers as to the legitimate out- 
comes of reading, with the result that, without diminishing 
any one's zeal, the emphasis was transferred from oral word- 
pronouncing to silent thought-getting in the building where 
this strong principal dominates the work so effectually. 

This is but one illustration of the use to be made of tests in 
the work of supervision. The com-se of study, both in its 
broad aspect and in its details, and the time allotment for 
the various phases of reading, should be made mainly in the 
hght of measurable results. Supervision of the instruction 
in reading may well be done in part on the basis of what has 
been revealed to be the particular needs of a given room, 
or even of a given child. Such questions as the following may 
frequently be asked; — 

Is meagemess of vocabulary at the bottom of this diffi- 
culty? 

Is this type of reading preparing the children best for un- 
derstanding their history or science work? 

Are the children reading aloud those things which can best 
be appreciated when read aloud? 

In short, the superintendent can now think in terms of 
standards upon such questions as the amount of attention 
to give to reading as a whole, and how much to each phase 
of reading both in his entire district and in each unit of his 
school system. 

Service to the teacher. By far the largest service of stand- 
ardized tests is being rendered to the teacher. Not only are 



fl 



M EDUCATIONAL TESTS AND MEASUREMENTS 

they enabling the teacher to check up his conception of what 
can justly be expected of children, but they are Indehbly 
impressing upon his mind the absolute need for recognizing 
the individual differences among his pupils in respect to each 
problem of learning. Some child who reads well orally from 
his reader because he confines his study of reading very 
largely to the daily lesson assigned, may be doing poorly in 
geography or in the problems of arithmetic. These content 
subjects depend upon reading, but not upon the sort of oral 
word-pronouncing which still too largely characterizes our 
reading periods. This particular child needs a different sort 
of reading. He would be found to stand low, probably, in 
vocabulary; probably low in quality of silent reading. If the 
teacher has before him a chart upon which is recorded the 
standing of this child in the various aspects of reading, he 
will no longer assign for his study the next page or two in 
the reader. He needs the sort of reading which widens his 
vocabulary more rapidly, and centers his thought upon mean- 
ing instead of upon words. Instruction from the third reader 
should not be for the purpose of preparing children to read 
the fourth reader. Reading is a tool, and its use in the con- 
tent subjects is the proper test of its efficiency. As pointed 
out in the report of the Survey Commission, it is probably 
more than a coincidence that in the Cleveland schools the 
per cent of failures in reading rapidly decreased from the 
lower to the upper grades, while failures in geography, his- 
tory, arithmetic, and grammar mostly increased. Where 
the stress is upon oral reading and speed of silent reading 
rather than upon thought getting, as the tests show the case 
to be iu Cleveland we cannot expect results in the content 



READING 

subjects to be veiy satisfactory. Therefore a teacher, with 
a fairly complete diagnosis of the reading abilities of his 
pupils before him, cannot fail to take into account the vary- 
ing needs of the individuals when directing the study of his 
class. 

As an illustration ^ of the aid of reading tests in such diag- 
nosis, the case of the Training School at Oshkosh, Wiscon- 
sin, may be cited. During a summer term of only six weeks 
pupils, by use of the Kansas Tests, the Gray Oral Test, and 
the Gray Silent Reading Tests, had their difficulties local- 
ized. Instruction was then given upon the points revealed to 
be needing attention. Twenty out of one hundred and five 
children were given different instruction from that given 
the class as a whole. Surprisingly greater resiilts were ob- 
tained in the case of those children whose instruction was 
specifically adapted to their difficulties. 

Service to the child. Since the beginning of schools chil- 
dren have been sent to school to be taught. That being the 
case, they wait to be told what to do, and there is the end 
of their responsibility. When the end of the month comes 
they look to theh report card for a measure of their suc- 
cess in doing what they have been told. 

A function of standardized tests, by which the child can 
measure his own achievements about as successfully as 
the teacher can, is that they bring the child into partnership 
with the teacher in directing the whole educative process for 
the child. If the child discovers by actual trial that he has 



(December, 1910.) 



I 



96 EDUCATIONAL TESTS AND MEASUREMENTS 
only three fourths as large a vocabulary as children of his 
grade the country over, or tliat he reads only three fourths 
as fast, he can be depended upon better to cooperate in over- 
coming the fault than when he is simply given a card every 
month with 70 assigned to hb reading. Particularly is this 
true if he feels that at the end of a given period he can 
take his own measure again to ascertain his gain. Children 
should be enlisted with the teacher in the effort to select the 
most needful sorts of materials for their study. ^Tiere one 
child needs problem solving, another needs a story, while 
still another needs something else than reading of any 
kind. 

Remedying the situation revealed. As results of reading 
tests are reported more and more widely, the conviction is 
gaining that we are not giving a due proportion of attention 
to silent reading. Children who read orally fluently arefound 
often to master a rather meager portion of what they read. 
In fact it is believed that habits of reading which are estab- 
lished by the too exclusive use of the oral type of reading 
frequently work to prevent the adequate development of 
silent reading abihty. Nevertheless teachers give but little 
conscious attention as yet to the development of this thought- 
getting ability or ability to read silently. In our schools most 
reading work still consists of oral expression. 

Because of this growing conviction it seems proper to set 
down a few of the simplest hints for turning the major em- 
phasis to silent reading. Since the Kansas Silent Reading 
Tests have been more widely used than others, thus es- 
tablishing a more reliable standard, and since they are so 
very simple for the teacher to administer, results obtained 




BEADING 

trom tlieir use will be regarded 8S a starting-point for these J 

suggestions. 

Suppose now that the teacher has given the test to his | 
class. The first need is for some simple practical device 
by which he may make a comparison of his class with the ' 
standard scores for the same grade of pupils. To assist in 
doing this the table on page 8 may be used. Here are given 
the standard scores for each grade from the third through 
the twelfth. In this table are also given the twenty-five per- \ 
centile (that point below which the lowest one fourth of ' 
the scores fall), and the seventy-five i>ercentile (the point 
above which the highest one fourth of the scores fall). By 
comparing the scores of one's own class with these marks, 
it is easy to determine in what respect one's class is different 
from the standard. It should be noted here that while the 
score of any child as recorded in the test combines speed and 
accuracy, the teacher may ascertain if he chooses, by exam- 
ining the paper, whether the child read rapidly and made 
many mistakes, or read slowly and accurately to obtain the 
score which is recorded. 

Types of situations revealed. The situations revealed by i 
the teat fall into three types: ■ — 

First, the class may correspond closely with, or surpass 4 
the standard distribution in both median score and vari-' ] 
ability; 

Second, the class may have a low median but satisfactory 1 
variability; or 

Third, the class may have a satisfactory median but too J 
wide variability. 

In connection with each type of situation certain sugges- 
tions may be considered. 



L 



98 EDUCATIONAL TESTS AND MEASUREMENTS 

A normal situation. Suppose first, then, that a teacher 

finds his class to correspond closely with the standard dis- 
tribution. The first thing to bear in mind is that these 
standards have been derived from the measure of actual 
achievement in schools of all sorts and do not necessarily 
represent ideal conditions of reading abiHty. Fm'thermore, 
silent reading has not had its proper share of attention. Cer- 
tain classes and certain entire school systCEns have been able 
to achieve median scores very much higher than these stand- 
ard medians and distributions with much less variability. 
It is a well -recognized fact, furthermore, that a very great 
waste in effort will continue in teaching silent reading and 
all the subjects dependent upon silent reading until we secure 
in general much less variation in ability among the members 
of a class than is now represented in the standard distribu- 
tions. For example, the median score of the poorest one 
fourth of fifth-grade children is 7, while the median score of 
the best one fourth of fifth-grade children is 20,4. This 
means that while one fourth of a normal class can read one 
page another fourth of the class can read nearly three pages 
with the same degree of comprehension. Class instruction 
b"sed upon common assignments of reading tasks must be, . 
under such circumstances, moat wasteful. If such assign- 
ments are well adapted to the slower pupils, they cannot 
at the same time bring out the best efforts of the strong ones. 
It behooves a teacher, therefore, to imdertake to secure a 
distribution much less varied than the standard, at the same 
time he raises his median score. Suggestions for accomplish- 
ing these two things will be made in the following paragraphs. 
To raise the median score. Suppose next that the test 



READING 

has revealed an unsatisfactory median score, but a satis- 
factorily close grouping of the scores around the median, 
and no individuals varying strikingly from the median. 

As an indication of how class medians vary among classes 
in the same grade Table XVII is given. From this table a 
teacher may know that if the naedian of his class, say fifth 
grade, falls below 11.89, his class is among the lowest one 
fourth of fifth-grade classes as judged by tbis test. In case, 
then, the teacher finds that his situation demands a raising ] 
of bis median score the following suggestions may aid, 



Table XVU. 



jClabs Medians by ti 
Reading Tests 



I Kansas Silent 



Grad^ 


Mw™ 


ir-ightil one faurlh 


ffamSn- q^ rla«» 




3.19 
7.6 
11,89 
11.78 
U.5 
17.06 


6.57 

n.fi 

15.36 
16.33 
18,78 
28.72 




4th 


136 



















Tbe lowoE am (ourlb d{ cIiuu in«Iinna (all l>tluw tbc li'tl-houd Ggun, bdcI tlie bigbot 
one fourth of l^Iuh omliAJis tall above thf rigbt-hoDd fiifUR. 

Overemphasis on oral reading. The commonest of all 
reasons for this situation, particularly when found in the 
grades below the sixth, is that the teachers have been placing 
chief stress upon oral reading. WTiere children are required 
to give their attention mainly to the correct pronunciation 
of words, the correct enunciation of sounds, and tlie correct 
inflection of the voice in passing over the several punctua- 
tion marks, not much growth in the power to comprehend 




100 EDUCATIONAL TESTS AND MEASUREMENTS 

meaning in the language can be expected. Where the chil- 
dren study their reading lesson with the point of view of 
being able to respond in this way, they fasten upon them- 
selves the habit of watching for words whose pronunciation 
they are not sure of, or they form the habit of reproducing 
the sounds of syllables, thus establishing the practice of 
moving the lips and other speech organs when reading si- 
lently- Frequently both these habits fix themselves upon 
children whose reading is judged mainly by the daily oral 
performance. When either or both habits become fixed a 
real struggle is required to break them. Unleps they are 
broken, however, the child suffers a severe handicap the rest 
of his reading life. Many men and women of mature years 
are still paying the price of those habits fixed in youth. They 
read but little faster silently than they can pronounce the 
words orally, because their speech organs make all the mo- 
tions of the successive words as the reading proceeds. 

Care from the beginning. To be on guard against these 
two habits care must be exereised from the very beginning. 
Children in the primary grades should have exereisea from 
the start in which the meaning is the only significaut ele- 
ment, and the response is not in terms of words said, but 
things done, or interpretations made. For example, let it be 
the usual thing for the child to carry out the directions con- 
tained in the word or sentence. The primary teacher should 
be supplied with some hundreds of cards upon which such 
sentences or short paragraphs bm the following are printed 
or written : — 

(1) Draw a picture of a flag on the blackboard. 

(2) Make a sound like a cross kilty makes when a dog chases her. 




I (4) Pla: 

I toa 



Hide behind the dooi. ,■ 

Play that you are carrying a cwii full of water and do not wish 
apill any of it. 

These cards should be graded in such a jvfiy.that certain 
ones will contain only the words taught in the fir^t rradtng 
lessons. As more words are learned, more cards will becorjc 
available. Word drills should then divide time very gen- 
erously with these practice cards which emphasize attention 
to meaning. 

Variety in handling the exercises may be introduced in 
scores of ways which will readily occur to a resourceful prim- 
ary teacher. Many other devices having the same aim will 
also occur to the teacher. The essential thing is that prac- 
tice in translating written or printed language into action 
instead of words should be started early, thus producing 
the habit of advancing through a. paragraph by thought- 
units rather than by letters, syllables, or words. 

Readily above the primary grades. In grades above the 
primary the problem is fundamentally the same as stated 
for the primary, but the devices must vary. 

First, whenever reading b done orally, be sure that what 
the child is reading is new to most of his listeners. Be sure, 
too, that the other pupils are listening, and not following 
along with the reader in another copy of the same book. No 
method of reading is more faulty in intermediate grades than 
that in which other members of the class are watching for 
a word error of the reader, ready to call attention at once to 
Biich a mechanical mistake. This method centers the atten- 
tion of the reader constantly upon the mechanics and 
develops the habit of attending first to the thought. WhereaSi 



I 




imentI^^H 



108 EDUCATIONAL TESTS ANB MEASURE! 

if the reader realizes that his- hearers know Dothing of the 
content of his selection "esii^t 'what they gather from his 
reading, then gi^ins Qie thought instead of pronouncing the 
words btoJmeS-.thfe controlling factor in his consciousness. 
It ft^lijwj' f rom this that only selections, the thoughts in 
. W^'ch'are vital to children, should be used as subject-mat- 
ter for such reading. Then let the one who has read such a 
selection defend the selection against questions or criticisms 
of the class. In short, center attention upon the meaning, 
even at the expense, if necessary, of accuracy in pronuncia- 
tion, enunciation, and expression. 

Second, let the amount of reading which is conipellingly 
interesting be increased. Supplementary reading in geo- 
graphy, history, science, and literature should be given a 
larger place. Require that the reports made upon such 
readings be rather exact, but let the selections be reasonably 
easy for the children. Gain in facility in silent reading can- 
not be secured by holding the children to selections which are 
so difficult that word-troubles absorb all the attention. One 
must be able to go with ease through the successive thoughts 
before the habit of attending to the thought can be ac- 
quired. 

Third, make all the industrial and playground exercises 
give a far greater measure of service in teaching reading 
than they now commonly give. How singularly short- 
dghtedwe are to ask a child to follow the directions printed 
in his arithmetic for finding the per cent that one number is 
of another, but employ a teacher to give orally the direc- 
tions for playing a new game, making a raffia basket, or 
pknting beans. The very things which come nearest the 



w 



READING 



natural interests of the children, concerning which they would 
most zealously read if they had the paragraphs containing 
the needed directions, are given to them orally. When inter- 
esting school exercises require a carefuJ following of direo 
tions, then those directions maice the most effective silent 
reading material. But in practice we seldom make use of 
them. This fault is due to a failure to imderstand the dis- 
tinction between the aim of the intermediate grades and 
the H'"\ of the upper grades. If we realized that all the work 
of the intermediate grades should be made to develop skill 
in using the tools of learning, then we should not conduct 
these exercises without making them aid in teaching reading. 

Reading in the upper grades. Passing now to the situa- 
tion presented when the score of a class above interme- 
diate grades is found to be low, we have the most serious 
task of all. The junior bigh-school or upper-grade pupil 
should be able to proceed with his school tasks without 
much attention to the tools he is using. It is not the 
primary function of this department of the school sys- 
tem to increase the children's facility in the handling of 
these tools. However, success in nearly all the tasks under- 
taken in the upper grades depends upon the skill which the 
children are expected to possess in the tool subjects. A 
compromise is, therefore, necessary, if children in the junior 
high school, or seventh and eighth grades, are found de- 
ficient in their ability to read silently. A few suggestions are 
here offered in the hope that some help may come from them, 
although it is realized that correcting reading faults at this 
I stage is very difficult. 

First of all. the children's own conscious efforts should be 



103 ^^B 

nter- ^^] 

4 



104 EDUCATIONAL TESTS AND MEASUREMENTS 

obtained in the direction of correcting the faults. Then, too, 
the teacher should see that he is observing the same funda- 
mental principles stated for the intermediate grades. Com- 
prehension, and not mechanics, must be made the test of 
all reading, whether in history, science, or hterature. The 
material selected for use must be sufficiently easy so that 
the children are not tied up in word or language difficulties. 
Again, to overcome the habit of proceeding by too small 
units, practice must be afforded in advancing by short 
sentences or phrases. 

In case the trouble seems to be that the children read 
fluently enough orally, but get little of the thought, intro- 
duce a great deal of the sort of reading requiring close atten- 
tion to the thought. For example, use rule books for football, 
basketball, and the like for those interested in games; catalog 
descriptions; directions for making certain stitches; the more 
involved arithmetic problems, and so on. These things pos- 
sessa minimum of word difficulty and a maximum of thought 
difficulty. They require the imagination to construct a pic- 
ture little by little and hold it up for constant modiflcation 
as the reading proceeds. Thus, attention is fociised on 
thought. 

^"here the class appears to have the right habits of read- 
ing silently but have had insufficient practice, the obvious 
suggestion is to give them all the practice possible. Much 
supplementary reading upon which they make only meager 
reports, if any, will help. Try to secure as much general home 
reading as possible. See that an abundance of interesting 
things is available for reading and stimulate interest by hav- 
ing the children's criticisms of them given before the class. 



READING 105 

Where variability is too wide. We come now to the situ- 
ation where the range of ability within a class group is too 
wide. Here the problem is a different one from that pre- 
sented by a class with a low average ability. Here different 
treatment is needed by the different members of the class. 
It is in this situation that a teacher needs to keep the diag- 
nosis of the abilities of his class graphically before him. Each 
part of the day's work must call for exercise from each child 
in those particular skills most needing development. No 
other service of scientific measurement in education promises 
more for the future than just this. 

Uniformity in instruction for all the members of a class 
widens variability among them, making the weak ones rela- 
tively weaker and the strong ones relatively stronger. To 
prevent this widening variability more attention must be 
given to individual instruction. This does not mean a level- 
ing of all members of a class, but rather affording a maxi- 
mum of opportunity to each member to do those things most 
needful to him. Those things which he can already do well 
he is not required to do, even though some other members 
of the class need them. 

Those children falling far below the median of the class 
should be given special physical examination to discover if 
possible the cause. Sometimes eyesight is found to be poor. 
Frequently some other physical defect has prevented normal 
mental growth. Sometimes an examination by means of 
approved intelligence tests, such as the Binet-Simon tests,' 

1 See eapccially The Meaiwenera of I-nleUigmce, by L, M. Terman. 
(Houston Mifflin Company, Boston, 1916.) A simple piide for liie use of 
the intelligence scale. 




106 EDUCATIONAL TESTS AND MEASUREMENTS 

reveab that the child is mentally incapable of doing work of 
the regular school tj-pe. 

Tbe difficult but normal case ; suggestions for helping. 
If, however, the child is found nearly normal physically 
and mentally, but lias not developed ability to get meaning 
from printed language, he presents a problem in instruction 
calling for the best professional skill to solve. The following 
suggestions may help: — 

First, make use of the devices suggested for raising the 
general average of ability in silent reading given in earlier 
pages. They can be used with the pupils who are below 
standard, while the other members of the class have other 
assignments. 

Second, it is quite certain that a pupil far below the median 
in thb basic ability has never made use of printed language to 
secure help in satisfying his own childish desires. If jx>ssible, 
situations must be brought about in which his desires or 
plans depend for their fulfill ntent ujwn his reading. It may 
be, for example, that his mother or father has been in the 
habit of reading stories to him. If so, and he can be made to 
be keenly interested in a story by having a part of it read to 
him, he should have to read the rest himself to satisfy hia 
desire to know the rest of the story. Possibly he would like 
to be the leader in an occasional nature study exclusion, but. 
of course, it will be expected that he look up information 
concerning the things they see on the trip and be able to 
report later to the group. That is the business of the leader. 
Or, he might umpire the baseball game if he made sure of the 
rules; or assign the parts in the coming school entertain- 
ment, if he read the various parts carefully so as to be able 




RK\DING 



107 



s a wise assignment; or score the ekss compositiona 
on the basis of which was most interesting. Such a list of 
possible opportunities for caUing into service a child's silent 
reading ability might be largely extended. The two things 
to guard against are (1) making reading a punishment and 
(3) confusing child need with school need. The thing to 
be accomplished is to give the child a chance to do some- 
thing which he really wishes to do but cannot do without 
reading. 

Third, see to it that the regular work assigned to him in 
the school subject is not too difficult. Skill in silent reading 
is developed in early years by reading widely in relatively 
easy matter rather than from reading intensively a very 
small volume. The very slow reader is usually one who has 
never caught the knack of disregarding words and attend- 
ing to thought. In order to acquire this knack it is necessary 
to have easy reading. Therefore, while the class is studying 
a text in history, let the slow reader be assigned some early 
biographical story bearing upon the events studied. 

Let it be said in conclusion that the chief aim of these 
suggestions shall have been attained if they serve to advance 
the conviction that there is a real problem of teaching silent 
reading, distinct from the problem of teaching oral read- 
ing. If the conviction once gets hold of the teachers, the 
problem is half solved. 



QUESTIONS AND TOPICS FOR INVESTIGATION 

1. What are the chief methods by which sdiilta add new words to their 
vocabularies? Are more new words learned from the context in which 
they appear, or from the dictionary? What oan you say concerning 
the beat way to increase the vocaliulary of children? 



108 EDUCATIONAL TESTS AND MEASUREMEN' 

2. What Bre BOme of the other factors besides vocabulary involved in 
silent reading? In what grades is vocabulary the most important fac- 
tor? Moke some auggestiona for guaranteeing the intimate associa- 
tioQ of the oientai concept which a word symbolizes, and the word 
itself when it is encountered in word drills. 

3. What is the significance of speed in reading? la there any truth in the 
rather conunoo belief that one who reads slowly "gets more out of 
what he reads " 7 IF you do not know the ajiswer, can you devise 
some way U> teat it out in your class? Compare your own silent 
reading rale with that of aonie equally well educated friends. 

i. What are the chief dangers involved in having much oral reading tn 
the lower grades ? Can these dangers be safeguarded ? What types 
of reading matter do you now read orally outside the schoolroom? 
Are these the types which your pupila are asked to read orally ? 

5. What are the circumstances under which you last read aloud? Do 
your pupQs have the same incentives for reading clearly and iolereat' 
Ingly that you had on that occasion? 

6. What are some of the things you do to assist your pupils in developing 
ability to comprehend the meaning of the printed page? Do you know 
of faulty habila which some of them have which prevent their center' 
ing attention upon the meaning? Do you know which pupila read 
with accuracy? Which with speed? 

7. Can you think of a more simple way than Thorndike used to test 
whether a child is familiar witb a given word or not? Pick out about 
six pages of material from unfamiliar texla in various subjo^ have 
the pupils read them, lightly underlining all the words which they do 
not know. Rank the children according to their vocabulary knowledge. 
Now use some standardized vocabulary test and see how the two 
rankings compare. What more does the standardized test accomplish 
even if the rankings are practically the same? 

6. How long does it take you to become famQiar with the reading diffi- 
culties of each child when you receive a new class of, say, thirty chil- 
dren? Would you consider it economical if some tests were avtulable 
by means of which you could discover these difficulties as well as others 
the first day and thus prepare a chart of each child's instructional 
needs? How long at the beginning of a term could you afford to spend 
in making such a diagnosis? 

B. Do you think the abilities of your class in silent reading vary as widely 
as is indicated by the table of scores for the Kansas Silent Reading 
Tests? What do you do to take into account the variability which 
you recognize does esist? 
10. Think of the last examination you gave in reading. How satisfactory 
do you think it was from the standpoint of objectivity? Of the equality 
of the questions? Did it test satisfactorily what you are striving to 
teach in reading? 





READING 



109 



11. Wbat are the adv&nlagea of standardized teats in reading as lUted b 
this chapter over the ordinary testa in reading? Name the testa here 
discuHsed, and briefly describe each. 

13. Which teat do you think will give you the most helpful informatioD? 
Whyf 



BIBLIOGRAPHY 



n here. For other refereneea 



I. Silent Reiadino 
1. The Kama) Siien! Reading Tetti, deinsed by F. J. Kdly. For copies 
of the tests, address Bureau of Educational Meaaurements and 
Standards, Emporia, Kansas, Test I is for Grades 111, IV. and V; 
Test 11. tor Grades VI, VII, and VIII; Test III, for Grades IX, X, 
XL and Xn. 

REFBHBNcBr Kelly. F. J. "The Kansas Silent Reading Testo"; in 
Journal of Edveaiionat Piychology, February. 1B16. 
S. Gray'i Silent Reading Testi. Copies of the tests may be obtained 
from William S. Gray, School of Education, University of Chicago, 
Chicago, Illinois. 

Referbncbb; Judd, Charles H. Mearuring the Work of the Publie 
Schooh. (Cleveland Foundation Survey Report, Cleveland, Ohio.) 
Also For sale by The Russell Sage Foundation, New York City. 

Gray, William S. "A Study of the Emphasis on Various Phases of 
Beading Instruction in Two Cities"; in Elemeniary School Journal, 
vol. 17, pp. 178-86. (November, 1916.) 

Gray, William S. "A Cooperative Study oi Reading in Eleveo 
Cities in Northern Illinob"; in Bltme/itary School Journal, vol. 17, 
pp. 250-^. (December, 1916.) 

Gray, William S. Studiei of Elemeniary School Reading through 

Startdardised Tealt. (Supplementary Educational Monographs, Uni' 

versity of Chicago Press.) 

S. Brown'a Sileni Heading Te»i, Copies may be obtained from H. A. 

Brown, Bureau of Research, 25 Capitol St., Concord. New Hamp*' 

Refebgn'CES: Brown, H. A. The Meaiurement of the Ability to 
Read. (Bulletin no. 1, Bureau of Research, Concord, New Uamp- 

Brown, H. A. "The Measurement of the Efficiency of Instruction 
in Reading"; in Elemailary School Teacher, vol. 14, pp. 177-00. 
(June, 1914.) 
' i. Slarvh'i SUvnl Reading Teala. Copies may be obtained from Daniel 
Starch, University ol Wisconsin, Madison, Wisconsin. 



I 



110 EDUCATIONAL TESTS AND MEj«UBEMENTI 

Rbferencb; Starch, Daniel. "The Meaaurement of Efficiency in 
Beading": in Journal of Ed-acational Psychology, ia-naury, 1915. 
6. Tl/oradike'i Scale Alpha for ileasuring the Understanding of Sen- 
teneei. Copies may be obtained from the Bureau of Publicatioua. 
Teachers College, Columbia University, New York City. 

Kbfghgnces ; Thorndilce, E. L. "An Improved Scale for Measuring 
Ability in Reading"; in Teachera College Record, November, 191{i, and 
January, 1916. 

Haggerty, M. E. TheAbiliiy to Read: lis ifeasuremenl and Some 
Faclori Conditioning it. (Indiana University Studies, no. 34.) 

6. The Minneaofa Scale Beta. Copies may be obtained from the Bureau 
of Coiiperative Research, University of Minnesota, Minneapolis, 

7. Fordyce's Scale for Meaturing the AchiEtemenla in Reading. Copies 
may be obtained from tlie University Publishing Company, Un- 
coln, Nebraska. 

8. Courtii't Standard Research Teete in Silent Reading. Copies may be 
obtained from S. A. Courtb, 82 Eliot Street, Detroit, Michigan. 

II. Obal Rbadinq 

I. Gray'a Oral Reading Teal. Copies may be obtained from William S. 

Gray, School of Education, University of Chicago, Chicago, Illinois. 

RcFEBENCEa : Judd, Charles H. Meaauring ike Work of the PuWtc 

SchooU. (Cleveland Foundation Survey Report, Cleveland, Ohio.) 

Gray, William S. Sitidiet of Elementary School Reading through 
Standardized Teals. (Suppletnentary Rducational Monographs, Uni- 
versity of Chicago Press.) 

in. VocABULAHY Tests 
1. Jones's Scale for Teaching and Testing Elementary Reading, Copies 

may be obtained from the Rockford Printing Company, Rockford, 

lUiDois. 

Referekcb; Jones, R. G. "Standard Vocabulary"; in FovrteenA 

Yearbook of the National Society for the Study of Education, part I, 

pp. 3T-*2. 
S. Haggerty's Vimial Vocahidary Teal for Gradea 1 and 2. Copies may 

be obtained from Bureau of Cooperative Research, Univereity of 

Minnesota, Minneapolis, Minn. 

Rbfebencb; Haggerty, M, E. "Scales for Beading Vocabulary of 

Primary Children"; in Elementary School Journal, vol. 17, pp. 108-U. 

(October, 1916.) 
3, Thomdike's Visual Vocahulary Scale Alpha. Copies of the scale 

may be obtained from Bureau of Publications, Teachers College, 

New York City. 




READING 111 

References: Thorndike, £. L. ''The Measurement of Ability to 
Read**; in Teachers College Record, September, 1914. 

Thomdike, £. L. '*The Measurement of Achievement in Reading; 
Word Knowledge**; in Tethers College Record, vol. 17, pp. 480-54. 
(November, 1916.) 
^. Slarch*8 English Vocabulary Test, Copies of this test may be obtained 
from Daniel Starch, University of Wisconsin, Madison, Wisconsin. 

rV. General References 

Anderson, Homer Willard. Measuring Primary Reading in the Dubuque 
Schools. (The Harris-Anderson Tests, Dubuque, Iowa, 1916.) 

Gray, William S. ''Methods of Testing Reading"; in Elementary School 
Journal, vol. 16, pp. 281-98. (February, 1916.) 

Otis, Arthur S. "Considerations Concerning the Making of a Scale for 
the Measurement of Reading Ability**; in Pedagogical Seminary, vol. 23, 
pp. 528-49. (December, 1916.) 

Pinter, Rudolph, and Gilliland, A. R. "Oral*and Silent Reading"; in 
Journal of EducaJHonal Psychology, vol. 7, pp. 201-12. (April, 1916.) 

Richards, Alva M., and Davidson, Percy £. "Correlations of Single 
Measures in Some Representative Reading Tests**; in School and Society, 
vol. IV, pp. 375-77. (September 2, 1916.) 

Uhl, W. L. "The Use of the Results of Reading Tests as Bases for Plan- 
ning Remedial Work"; in Elementary School Journal, vol. 17, pp. 266-75. 
(December, 1916.) 

Ziedler, Richard. "Tests in Silent Reading in the Rural Schools of 
Santa Clara County, California*'; in Elementary School Journal, vol. 18» 
pp. 55-62. (September, 1916.) 



i 



CHAPTER IV 

SPELLING 



1 



I. The Pbobleu of Mrasubement in Spelling 
Difficulties encountered. Tbe measurement of spelling 
ability involves certain difficulties which are peculiar to the 
subject. Shall spelling ability be construed to mean ability 
to spell words when the attention is focused upon the ideas 
which are being expressed in writing, rather than upon the 
spelling of the words? Or is it the ability to spell words when 
the attention is focused upon the spelling of words, as in the 
case of dictated spelling lists? Upon the basis of what words 
shall one's spelling ability be measured? Shall spelling abil- 
ity be defined in terms of the per cent of correct spellings 
of a limited group of frequently used words, or shall it be 
defined in terms (A the extent of one's spelling vocabu- 
lary.* 

It appears to be the consensus of opinion that one needa 
to be able to spell correctly the words used frequently, with 
a minimum of attention or automatically, as is the case in 
writing letters, compositions, school exercises, and the Kke.i 
Otherwise, it is impossible for one to focus his attentJCMi' 
upon the ideas which are being expressed. This type rf 
spelling ability is defined as the per cent of words spdied 

' There are other phases of spelling ability, such as an idea] of coneot 
ipelling whii^ will insure tbe use of the dietionat^ in the case cf votdi 
coQcerniDg whose spelling a writer is uncertain, or the ability to detect 
words which arc mis^Klled. However, instruments for measklring t.lw^ 
phases of qieDiiig ability bave not been devised. 






Riuscll Sase Fotrndation, New York Cit? 

Dinsioa of Edncation 

Leonard P. Ayres, Director 







SPELLING lis 

cmrectly. to addition, it is desirable that one should be able 
to spell a number of words which are used only occasionally. 
In the case of the more difficult and UQUsual of these 
words it is probably sufficient if one is able to !9>ell them 
correctly when attending to them. 

We shall consider first what the words most commonly 
used are, and how to measure the ability of pupils to spell 
them. 

The foimdation words of the English language. In deter- 
mining the most commonly used words the method em- 
ployed has been to examine written material of se\-eral 
types, such as letters, newspapers, and children's composi- 
tions, and to obtain a list of the words used and the number 
of times each word occurs. Ayres * has combined the results 
of four such studies. Two of these studies were based on 
letters, the third upon newspapers, and the fourth upon 
selections of standard literature. The material examined 
in the four studies aggregated 368,000 words, written by 
twenty-five hundred different persons. 

It was the original intention of Ayres to identify the two I 

thousand most commonly used words, but this was impos- 
sible because the material examined was found to consist of 
a. few words used many times, and of a larger number of 
words used only a very few times. It was found that fifty 
different words were used so frequently that they made up 
approximately half of the material examined. In order to 
secure a list of the thousand most frequently used words 
it was necessary to include words which were found only 

' Ayrea, L. P. Meaturemetil of Abiliiy in Spring, (Bulletin of ths Divi- • i 

mm U EdlRBtion, Russell Sage FaundBtioo. New York City, 1915.) \ 



MEASUBEMEN^^^^I 



114 EDUCATION.\L TESTS AND 

forty-four times in the 368,000 words of material exam- 
ined. This list of one thousand words is the best state- 
ment which we have of the words that form the core or foun- 
dation of the English language. 

Another important study has been made by Jones.' He 
collected themes from pupils in grades two to eight inclusive. 
In order that a record of the complete writing vof^bulaiy 
of each pupil might be obtained, a large number of composi- 
tions were written, the number per pupil ranging from 56 
to 105. A total of 75,000 themes, consisting of a total of 
15,000,000 words and written by 1050 pupils residing in 
four States, were examined. However, only 4532 different 
words were used by these pupils. . 

Making a spelling test. After we have a list of the most 
commonly used words, such as Ayres has ^ven us, there 
remains the problem of constructing a test to measure the 
automatic spelling abihty of pupils. It is a well-known 
fact that some words are more difficult to spell than others.* 
The words included in a test either must be equal in difficulty, 
or their relative difficulties must be known. Otherwise we 
will be using a measuring instniment consisting of unequal 
units, but will be considering the units to be equal. This 
condition would cause the measures made with such an 
instrument to be inaccurate. Pupils spelling only a few 
words correctly would receive a score higher than they 

' Jones, N. Franklin. CoJicrele Invedigationa of Ike Material of EnglCik 
Speiling. (University of South Dakota Bulletin. 1913.) 

' The spelling difliculty of a word has two interpretations. It may be 
taken to mean the difficulty which children experience in learning to spell 
it. It may also refer to the frequency with which it is misspelled. The 
latter meaning will be used in this chapter. 





R SPELLING 115 ^H 

1, and the bright pupils vrould receive a, score loner ^^H 

than they deserved. ^^M 

The spelling difficulty of words for a given group of chil- ^^ 

I 



dren may be determined by having the words spelled by 
them. From the per cent of correct spellings of each word the 
relative difficulty of the words may be calculated, ^ Words 
which are misspelled an equal per cent of times by pupils of 
a given grade are equal in difficulty for that group. In the 
absence of this information it is practically impossible for 
a teacher to judge the difficulty of the words. Buckinghai 
concluded that the judgment of a single teacher is almost 
of no value, " It may be good and it may be bad; and it 
about as Hkely to be the one as the other." 



II. Spellinq Scaijs 
1. The Ayres Spelling Scale 
How constructed. To determine the words of equal diffi- 
culty and the relative difficulty of the groups of words, Ayres 
divided the thousand words into fifty lists of twenty words 
each. Each list of words was spelled by the children of two 
consecutive grades in a number of cities. The thousand 
words were then divided into another fifty lists of twenty 
words each. Each of the new lists was speUed by the 
children in four consecutive grades. In all 70,000 children 
spelled twenty words, making a total of 1,400,000 spellings, 
or an average of fourteen hundred spellings of each of the 
thousand words. 

> See Buckingham, B. R. Spelling Ability, lit MearuTemenl and Di»- 
Inbutton. p. SB, and [ullowmg for the method. (Teachers CoUege Contnbu- 
tioiu to EducBtioQ, DO. sg, 1913.) 





lUBEMENT^^^H 



116 EDUCATIONAL TESTS AND MEASl 

Upon the basis of this informatJOD Ayres classified the 
words into twenty-six ({roups, the words of each group being 
approximately equally difficult for school children of a given 
grade,^ This classified Ust, together with the per cent of 
pupils in each of the grades who spelled the words of each 
list correctly, has been printed with the title. Measuring 
Scale for AbilUy in Spelling. This scale is reproduced as 
Fig. 5.* 

Pupils who are not tested. When a pupU spells correctly 
all of the words of a given list, we do not have a measure of 
his spelling ability. We simply know that he can spell these 
words correctly; we do not have any information concerning 
how far beyond this list his spelling abihty extends. In fact, 
the pupil lias been given no opportunity to show how well 
he can spell. It is a well-known fact that the pupils of any 
grade or of any class are not equal in ability, but exhibit a 
wide range of abihty. Thus in testing a class it is necessary 
to use words for which the average per cent of correct 
spellings is less than one hundred. Ayres recommends that 
in making a test for the pupils of a given grade the words 
be taken from the column for which an average of eighty-four 
per cent of correct spellings may be expected. ' 

Figure 6 represents a typical result of using the 'words 

' For the details of the method employed see Ayres, L. P. Metuuremenl 
of Spelling Ability, pp. 88-35. 

' The author la indebted lo Dr. Ayrea for permission to reproduce 
this scale. 

* The reader should not confuse scores or meaaurea irf ability with 
school marks. The per cent of correct spellings is a measure. The school 
nmrk is the meaning which tlie school attaches to that measure. The fact 
that both the meAfUre and the school mark may be expressed in per 
cents does not make them the same. See chapter ix tor a more complelfl 
discussion of this point. 



4 



chosen as Ayrea recommends. The class average is eighty' 
four per cent, but those pupib who spelled all of the words 




1 

.J 



correctly have not been tested. Those who misspelled only ^ 
one or two words probably have not been tested satisfao J 
torily. 

Otis ' presents facts from which he concludes that the most 
reliable measures of spelling ability are obtained by using 
words for which there is an average of fifty per cent of cor- 
rect spellings. In support of this conclusion he points out 
that a list of words for which the average per cent of correct 
D Schnol and Sorictn, 





118 EDUCATION/a TESTS AND MEASUREMENTS 



spellings was either zero per cent or one hundred per cent 
would yield a measure of zero reliability. Likewise a list of 
words for which the average per cent of correct spellings was 
ten per cent or ninety per cent would yield measures only 
slightly more reliable. Hence it seems natural that the mosi 
reliable measures would be obtained by using a list for which 
the average per cent of correct spellings was fifty. On the 
other hand, some writers claim that it is not wise to have 
pupils spell words incorrectly. Every repetition tends to fix 
a habit. 

Ayres gives no satisfactory justification for recommending 
the choice of words for which an average of eighty-four per 
cent of correct spellings may be exi>ected. When measuring 
the spelling ability of children in Springfield, Illinois, Ayres 
used words for which seventy per cent of correct speUings 
had been obtained. For the Survey of Cleveland, Ohio, the 
words were chosen from columns for which the average 
per cent of correct spellings was seventy -three. Thorndike 
has used words for which the per cent of correct spellings 
is fifty. ^ For these reasons it is probably best to choose 
words from columns for which the average per cent of cor- 
rect spellings is approximately seventy. 

How many words to use. Another question which must be 
considered in making a spelling test is the number of words 
it is necessary to use. In general the ability to spell one word 
is separate and distinct from the ability to spell any other 
word. AbiUty to spell, therefore, consists of a large number 
of abilities to spell specific words. This being the case it 

' Thorndike, E. L. "Means of Wcasuring School Arhievement ir 
ing"; in Educalional Adntinislration and Siipermaion, vol. i 





SPELLING 



would be necessary to use all of the thousand words of 
Ayres's hst in order to obtain a complete and accurate meas- 
ure of a pupil's ability to spell the most commonly used 
I words. However, it is possible to seciu^ a measure which is 
{ representative of the pupil's ability to spell these words by 
using a smaller number of words. This is possible in just the 
same way that it b possible to determine the quality of a 
load of wheat or a vat of cream by the examination of a 
I sample. 

( How many words are necessary in making a spelling test 

' depends upon what is desired. Relying upon the theory 

' of random sampling, Thorndike believes a smaJl number 

I of words is sufficient to measure the spelling achievement 

of B large school system. A test consisting of only ten 

words has been used in a number of school surveys. This 

number is probably sufficient for the measure of a large 

school system, but if it is desired to obtain a measure of the 

spelling ability of individual pupils, a larger number must 

be used. Otis ^ says that a twenty -five word test gives a very 

poor measiire of individual ability, and that at least one 

hundred words should be used, better four hundred or five 

hundred words. Starch recommends the use of two hundred 

words. 

Methods of giving the test. The words which make up 
a test may be dictated to the pupils as separate words, or 
they may be embedded in sentences which are dictated. 
Furthermore, the dictation of the sentences may be timed 
so that the pupils are forced to write at their normal rate. 
Investigation has shown that the per cent of correct spell- 
' Loc. cit„ pp. 670, 



I 



120 EDUCATIONAL TESTS AND MEASUREMENTSl 

ings is higher when the words are dictated separately than 
when they are dictated in timed sentences and the pupils 
are forced to write at their normal rate. According to Courtis 
the per cent of correct spellings is about five greater when 
the words are dictated in lists. Fordyce has found this differ- 
ence to be between ten and fifteen per cent. 

This means that the ability to spell words when the at- 
tention is focused upon the spelling, as is the case when the 
words are dictated separately, is not the same as the ability 
to spell the same words when the act of spelling is in the mar- 
gin of one's attention. In writing letters, compositions, and 
the like, the spelling must be carried on in the margin of the 
attention because the ideas which are being expressed must 
occupy the focus of the attention. This is particularly true 
of the foundation words of the language such as we have in 
the Ayres list. The words of this list constitute over ninety 
per cent of the words we use. Hence, by using the words 
embedded in sentences and dictated rapidly enough to force 
the child to write at his normal rate, we measure the spelling 
ability which functions in one's everyday writing. 

Letters per minute. Pupils may be caused to write at 
approximately their normal rate by dictating the sentences 
at that rate. The standards for speed of handwriting are as 
follows in terms of letters per minute: second grade, 36 let- 
ters; third grade, 48 letters; fourth grade, 56 letters; fifth 
grade, 65 letters; sixth grade, 72 letters, seventh grade, 80 
letters; eighth grade, 90 letters. The dictation of a sentence 
requires some additional time, probably 10 per cent. For ex- 
ample, in the case of the sixth grade, instead of dictating 
at the rate of 72 letters in one minute, 66 seconds should be 



SPELLING Ul'^^l 

allowed for words totaling 72 letters. The number of seconds ' 

to be allowed per letter for the several grades are as follows : — 

n 1.83 

m 1.38 

IV 1.18 

V 1.01 

VI W 

Vn 83 

vni 7s 

If the sentences contain more than thirty to forty letters, 1 
they should lie dictated in sections, so that the pupils' writ- 
ing will not be slowed up by trying to recall what has been 
dictated. Furthermore, tests of speed in handwriting have 
showed that all pupils do not normally write at the same 
rate. For this reason provision must be made for those 
pupils who are accustomed to write more slowly than the 
standard rate. This can be done hy having none of the test 
words come at the end of the sentences, and requiring all 
pupils to begin upon the next sentence as soon as it is dic- 
tated, even if they have not finished writing the prece<^g. 

What the Ayres Scale really is. Strictly speaking, the 
Ayres "Measuring Scale for Ability in Spelling" is not a 
measuring instrument in itself, but rather a list of the foun- 
dation words of the English language, classified into twenty- 
Mx groups according to spelling difficulty. The teacher should 
use this list as a source of words for constructing spelling 
tests. These tests should be constructed according to the 
following principles, which have been considered in the pre- 
ceding pages: — 

I . The words for a test should be chosen from one colui 




122 EDUCATIONAL TESTS AND MEASUREMENTS! 

so that these will be equally difficult to spell. If this is 
done, the inequality of difficulty must be recognized, if an 
accurate measure is secured, 

2. Twenty words are probably sufficient to secure a re- 
hable measure of the spelling ability of a class. At least 
fifty words should be used to secure a reliable measure of 
the spelling abihty of individual pupils. More accurate 
measures will be obtained by using one hundred words. In 
the case of the upper grades it will be necessary to use words 
from more than one column. When this is done the relative 
difficulty of the words must be recognized to secure an ac- 
curate measure. 

3. In order that the words may be difficult enough to 
really measure the spelling abihty of all pupils the words 
should be chosen from columns for which the standard per 
cent of correct spellings b approximately seventy. For 
the lower grades it is probably best to use words for which 
the standard per cent of correct spellings is from fifty to 
sixty-six. If the words are to be used in timed sentences it 
will probably be satisfactory to use easier words. 

4. The words should be embedded in sentences, and the 
sentences dictated at approximately the standard rate of 
handwriting for the grade. Test words should not occur 
at the end of the sentences. 

Directions for giving a timed sentence test The fol- 
lowing test has been constructed in accordance with the 
above principles. The directions given below should be 
followed in giving it: — 

1. See that the pupils are provided with two or three 
sheets of paper, and with either pencil or pen and ink. If 



i not 



r 



SPELLING 



pennls are to be used, they should be well sharpened. If 
pen and ink are used, good pen points should be provided. 

2, Say to the pupils: " I have some sentences which 
want you to write as I dictate them. I am going to dictate 
them rather rapidly, possibly more rapidly than some of 
you can write. If you have not finished writing 
tence when I begin to dictate another, I want you to leave 
it and begin on the new sentence. If there are any words you 
cannot spell you may omit them. Take time to dot your i's 
and cross your t's. If you have any question about what you 
are to do, ask it now because you cannot ask questions after 
I begin to dictate." 

3. Make certain that all pupils understand what they 
are to do. It is weU to ^ve a short prelimmary practice in 
writing from dictation if the pupib are not accustomed 
to it. For this purpose use some simple selection. 

4. Dictate the first sentence when the second hand of 
your watch is at 60. When it reaches S7, dictate the second 
sentence. When it reaches 13, dictate the third, and so on. 
Dictate the sentences distinctly, but do not repeat. It ia 
advisable for the teacher to practice dictating the sentences 
according to the directions before attempting it with a class. 

5, Stop the pupils promptly at the time indicated. Allow 
no corrections to be made. Ask the pupils to turn their 

' papers over and write their name and grade. Collect the 
papers. 



133 ^1 

If ■ 

led. H 

ch I H 

of ^M 



;UR£M£NTa^^| 



Hi EDUCATIONAL TESTS AND MEASl 



A Timed Sentence Spelling Tbbt or Finr Woeds taken 
FBOM Column O ' 

(Arranged for a Fourth Grade) 

(60) The puftiic appear to want it. 

(27) The population of the diglrict is five hundred. 

(13) We refiue to atUnd the meeting. 

(44) My uru^ will remain until the man comes. 

(23) Forty members of the fourth company will drill. 

(B) The j'udffe knew the chief of police was there. 
(52) The whole address was tiresome. 
(23) The comfort of out friend is to be considered. 

(7) A perfect figure was drawn. 
(33) During the month of August we elect a teacher. 
(17) The naxy will go farther away. 
(43) A board from the shed ia needed. 
(15) The house was between the jaU and the store. 
■ (38) I am getting the eeamd order. 

(26) The station is loortk more money. 

(57) Madam will return Thursday morning. 

(82) We don't request attendance. 

(60) What is the objection to a personal letter? 

(41) A sudden change in the weoiher will come soon. 
(25) Duty before jAcasure is an old saying. 

(2) Eight men intend to retire from the house. • 

(42) It would be proper to call. 

When the second hand reaches 7, stop the work. 

2, The Buckingham Spelling Scale 

Starting with a list of about five thousand words o 

to at least two out of five spelling books, Buckingham ' 

selected by means of an elaborate statistical procedure two 

' Thi> test 19 one of it irrkei lioviscd hy the author. Specimen ec^ies may 
be obtained from tlic Iturmu uf KdiK'fttional MeMUKmcuti and Standards, 
Emporin. KansM. 

* Buckingham. B. It, SprUint Atililn, It" Sltaiurrmnl and Diitnbit- 
(ioR. (ToM^ CoU^e CoBtrlbutkHU tv Educmtiun. do. 59. 1913.) 



SPELLING ifUf^ 

liata of twenty-five words each. The purpose of the selection 
was to secure "words which were easy enough in the third 
grade and hard enough in the eighth grade to afford a teat 
in those and therefore intermediate grades, and which 
showed regular increases in per cent correct from grade to 
grade." The diEBculty of each word was determined in 
terms of a common unit. Since the difficulty of each word 
is known the entire list may be used as a test, or any desired 
number may be selected from it.' However, the fact that 
the scale consists of only fifty words limits its usefulness to 
the general measurement of groups of pupils,' 

S, Starch's Spelling Scale 
Measuring ttie extent of ability to spell. In securing a 
measure of the number of words which a pupil or a class can 
spell correctly we are not concerned simply with the most 
commonly used words of the English language, but rather 
with all the words of the language.' Starch has prepared a set 
of six word lists, each consisting of one hundred words. The 
words for these tests were chosen by taking the first defined 
word on the even-numbered pages in Webster's New Inter- 
national Dictionary {1910 edition). Technical, scientific, 
and obsolete words were discarded from the list. The re- 

' See E. L, Thorntlike, "Means of Measuring School Achievement in 
Spelling" (Educational AdminialTaliim aitd Supervision, vol. i, p. 306), f(H 
an arrangement of words from Buckingham's list in sentences. 

* See on this point Tidyman, W. F., "A Descriptive and Critical Study 
of Buckingham's Investigation of Spelling EfBciency"; in E^itaaiontd Ad- 
minittration and Supervition, vol. ii, pp. £90-304. 

' Starch, Daniel. "The Measurement of Efficiency in Spelling, and the 
Overlapping of Grades in Combined Measurements of Reading. Writing, 
aadSp^^ng"; in Journal of Educational Psychology, vo\.0. (March, 19 IS.) I 



i 




186 EDUCATIONAL TESTS AND MEASUREMENTS 

tnaining six hundred words were arranged alphabetically 
according to size. When arranged in this way they were di- 
vided into six lists of one hundred words each by taking the 
first, seventh, thirteenth, a.nd so forth for the first list; the 
second, eighth, fourteenth, for the second list; and so forth. 
Each of these lists consists, therefore, of one hundred words 
taken at random from the non-technical words of the Eng- 
lish language. Such a list is an instrument for measuring 
the extent of a pupil's correct spelling vocabulary. 

The words in each list are arranged according to the num- 
ber of letters they contain. Ayres ' found for the words of 
his list a very high correlation between the length of words 
and spelling difficulty. Assuming this to be true for Starch's 
list, the words may be considered to be arranged in the 
general order of spelling difficulty by groups. A pupil's score 
b is the number of words spelled correctly, no account being 
taken of the difficulty of the word spelled. The score is an 
index of the total number of the non-technical words of the 
English language which the pupil can spell correctly. 

Starch's spelling lists. Lists I and II are given on the 
following pages. In using these lists as tests the words are 
simply dictated, and the pupil allowed to focus his atten- 
tion upon the spelling. To secure a reliable measure both 
lists should be used and the two sets of scores averaged. 




St\bch'b Spelling List, No. 1. 
1. add 5. rat 9. cart 

%. but fi- sun 10. come 

3. get 7. aliun 11. easy 

4. low 8. blow 12. fell 
Ayres, L. P. Meaturement of Spcltii^ AlnlHy, p. 38. 



I 



18. foul 

14. gold 

15. head 

16. kiss 

17. long 

18. mock 
19; neck 

20. rest 

21. spur 

22. then 
2S. vile 

24. afoot 

25. black 

26. brush 
?7. dose 

28. dodge 

29. faint 

80. force 

81. grape 

82. honor 
3S. nfbce 

84. paint 

85. prism 

86. rogue 

87. shape 

88. steal 

89. swain 

40. title 

41. wheat 



1. lur 

2. cat 
8. hop 

4. man 

5. row 

6. Up 

7. awry 

8. blue 



SFKIJ.TNG 




42. accrue 


71. flourish 


48. bottom 


72. luckless 


44. chapel 


78. national 


45. dragon 


74. pinnacle 


46. filter 


75. reducent 


47. hearse 


76. standing 


48. laden 


77. venturer 


49. milden 


78. ascension 


50. pilfer 


79. dishallow 


51. rabbit 


80. imposture 


52. school 


81. invective 


63. shroud 


82. rebellion 


54. starch 


88. scrimping 


55. vanity 


84. unalloyed 


56. bizarre 


85. volunteer 


57. compose 


86. cardinally 


58. dismiss 


87. connective 


59. faction 


88. effrontery 


60. hemlock 


89. indistinct 


61. leopard 


90. nunciature 


62. omnibus 


91. sphericity 


63. procure 


92. attenuation 


64. rinsing 


98. fulminating 


65. splashy 


94. lamentation 


66. torpedo 


95. secretarial 


67. worship 


96. apparitional 


68. bescreen 


97. intermissive 


69. commence 


98. subjectively 


70. estimate 


99. inspirational 




100. ineffectuality. 


I's SpEiiUNa List, 


No. 2 


9. cast 


17. look 


10. com 


18. mold 


11. envy 


19. part 


12. feud 


20. ruin 


18. game 


21. take 


14. grow 


22. tree 


15. home 


23. well 


16. knee 


24. allay 



127 



1 




^^H 


les EDUCATIONAL TESTS AND MEASUREMENTS.^^! 


». blaze 


SO. portal 


75. reformer ^^^ 


as. buggy 


51. recipe 


76. thorough ^^H 


87. clown 


S3, scrape 


watering ^^H 


«8. doubt 


53. simple 


78. belonging ^^^ 


29. false 


54. alrain 


7». displayed 


SO. forth 


55. weaken 


80. indention 


31, grass 


56. breaker 


81. mercenary 


3S. house 


67. congeal 


82. redevelop 


33. money 


SS. disturb 


83. senescent 


34. paper 


59. foreign 


84. uncharged 


35. quill 


60. hoggery 




36. rough 


61. meaning 


86. centennial 


37. shout 


62. onerate 


87. constitute 


38. stick 


63. provoke 


88. exaltation 


39. swear 


64. salient 


89. bvocative 


40. trump 


65. sUtion 


SO. personable ] 


41. whirl 


66. trample 


91. strawberry 


4%. action 


67. abstract 


92. concentrate 


43. bridle 


68. buUetin 


93. imaginative 


44. charge 




94. mathematics 


45. driver 


70. eugenics 


95. selfishness 


46. finger 


71. friakful 


96. collectivity 


47. heaven 


72. luminous 


97. marriageable 


48. legend 


73. opulence 


98. agriculturist 


49. moUey 


74. planchet 


100. reiinquishmeot^^H 




in. Standabds 


V 


The Ayres Scale 


In classifying the words of his list ac- 


cording to diiBculty, Ayres determined the average per 


cent of the pupils 


of each grade who spelled the words 


correctly. Thus the words of column 


were spelled correctly 


by SO per cent of the third-grade pupils, 73 per cent of the 


fourth-grade pupils 


84 per cent of the fifth-grade pupils, 92 


per cent of the sixth-grade pupils, 06 per cent of the seventh- 


grade pupils, and 99 per cent of the 


eighth-grade pupils. 


These per cents, which are printed at the head of each 






11 




I , _ 


d 



SPELLING 



column, represent the average spelliag ability of pupils ii 
the several grades when the words are dictated in lists. 
When the words are used in timed sentences the averages 
have been 5 to 15 per cent lower. 

In Boston minimum word lists for each grade have been 
carefully built up on the basis of the words which the pupils 
use in their written work. When the Boston pupils were 
tested on the words of the Ayrea Scale which are included in 
their several grade lists, the per cent of correct spelhngs wna 
conspicuously above the standards given by Ayres.' In re- 
porting that study Ballou suggests that " this may be due to 
the fact that the Boston pupib had been taught the words; 
whereas, the pupib in the eighty -four cities where Dr. Ayres 
gave his Hsts, and on the results of which he standardized 
the words for his spelling scale, may not have been taught 
them." 

For thj^ reason it may be seriously questioned whether 
the averages which Ayres gives are satisfactory standards, 
of spelling ability for the foundation words of the language. 
Ayres says: "Probably the scale will have served its great- 
est usefulness in any locality when the school children have 
mastered these one thousand words so thoroughly that the 
scale has become quite useless as a measuring instrument." 
In the past we have not had the advantage of such a list and 
have distributed our efforts in teaching spelling over a very 
much larger list of words. If we accept these one thousand 
words as the foundation words of our language, we should 
place prime emphasis upon teaching them. This being the 

< BbBou, V. W. "Me«aurini; Boston's Spelting Ability by the Ayram | 
Exiling Scale"; in School and Society, vol. v, pp. 267-70. 



[29 ^H 



i 





M 

130 EDUCATIONAL TESTS AND MEASUREMENTS ^^H 

case a satisfactory eighth-grade standard would approxi- 
mate one hundred per cent for all of the words. For the 
preceding grades, the standard would be one hundred per 
cent for the words of the list which the pupils had been 
taught. For example, the easiest nine hundred words might 
be used for the seventh grade, the easiest seven hundred and 
fifty for the sixth grade, and so forth. The use of the scale in 
the way Ayres suggests would seem to lead to standards of 
this type. The distribution of the words among the several 
grades and the optimum standards must be determined by 
experi mentation . 

The Starch tests. Starch gives the following standards 
for his tests based on their use with over twenty-five hun- 
dred pupils. 




<j™d« 


' 


" 


III 


jr 


r 


VI 1 VII 


vni 


Per cent of words 
spelled correctly. . . . 


10 


30 


40 


fil 


61 


71 


78 


85 


These standards are interpreted thus: the average eighth- 
grade pupil should be able to spell correctly eighty-five per 
cent of the non-technical words of the Engfish language, 
or eighty-five of the one hundred words in any one of 
Starch's tests. 

IV. How TO LOCATE SPELLING DlFFlCDLTIES 

Locating bad spellers. On page 119 it was stated that a 
list of ten to twenty words would give a reliable measure 
of a school system, or a good-sized class, but that a test of 



^^^^^^^ SPELLING 131 

one hundred words or more was necessary to obtain a 
reliable measure of tlie spelling ability of individual pupils. 
■ From the standpoint of making her instruction most effective 
the teacher is not so much concerned with securing a reliable 
measure as in locating the pupils who are below standard, 
and who for that reason need instruction. A test of twenty 
words will locate many of the cases, but a test of fifty or 
one hundred words will be more effective. I 

A tow class average may be due to one or more of three 
conditions: — 

1 . The class as a whole may be unable to spell certain words. 

2. Certain pupils may be unable to spell a large number of 
the words of the test. 

3. The errors may be rather uniformly distributed as to , 
both words and pupils. 

To determine the extent to which each condition causes 
the low class average, the teacher should make the following 
type of tabulation from the test papers. 


WerOt o/Ihe tsU 


1 


' 


' 


' 


' 


' 


' 


' 


' 


• 


,0 


" 


IS 




I 


I 


I 


I 


I 




I 




I 


c 


'l 


I 










DCga 




c iDdrcota the word wu mrrecUy ipcllcd. 

Although these words are listed by Ayres as being equally 
difficult for pupils in general, they are not necessarily so for 



r 



138 EDUCATIONAL TESTS AND MEASUREMENTS^ 

particular pupils. Obviously in the class here represented 
"catch" and "clothing" need general emphasis, while only 
certain pupils need to give attention to " black," " began," 
and "unless." Pupil II has misspelled five out of six words, 
and hence probably is a " poor speller," 

ladividuality in spelling difficulties. Simply to know that 
a pupil is below standard in ability is of little value to the 
teacher, because spelling ability is specific and not general. 
In general the ability to spell one word does not imply abil- 
ity to spell another word, nor does the lack of ability to spell 
a given word indicate that a pupil cannot spell another word. 
Hence the teacher should make a very careful diagnosis cJ 
the spelling abihty of each pupil whose test score is below 
J standard to ascertain just what words he cannot spell of 
those he is expected to spell. 

This b accomplished by gi%'ing the pupils below standard 
a test including all <^ the words which they are expected to 
beableto^iell. Such a test is not fi» the purpose of measure- 
mmt bat sbookl be iLoo^t tjt as the first step in the teach- 
uig of spdhng. EmA papH should be required to make fnKD 
this test a list of aH the nwds which be has spdkd inoor- 
RCtly. The words of this list are the ones be needs to study. 
It is obvious that to ask a puptl to study words which be 
can alraaily speU oonectty is to ask him to use his time 
«ilho«t pcoGt. 

BOOK.** Certain fntquentlf used words are 
•mryfnqpKmifyni^Ai. JciDes ' has ^Toa us a hst of 100 
worrli vfaU be fe— d mw^wtlW mist baianitty in cfail- 
dicB*soQiii|ioBtM)ds. He calls then U«**QBehimdiedSpcll- 





SPELLING 



133 



ing Demons of the English Language." Nine tenths of these 
words are found in Jones's Ibt for the second and third grade. 
Four fifths of these words are found in Ayres's list. A teacher 
will make no mistake in empliasizing these words in the 
teaching of spelling until the pupils can spell them correctly. 







{Jonc.) 






which 


can't 


guen 


they 


their 




says 


hair 


there 


loose 


having 


break 


gepSMite 


lose 


just 


buy 


don't 


Wednesday 


doctor 


again 




country 


whether 


very 


businee* 


February 


believe 


none 


many 


know 


knew 


week 


friead 


could 


laid 


otUn 


some 


ieema 


tear 


whole 


been 


Tuesday 


choose 


won't 


■ince 


wear 


tired 


cough 


used 


answer 


grammar 


piece 


always 


two 






where 


too 


any 


ache 


women 


ready 


much 


read 


done 


forty 


beginning 


said 


hear 




blue 


hoarse 


here 


trouble 


though 


■hoes 


wriU 


among 


coming 


to-night 


writing 


busy 


early 


wrote 


beard 


built 


instead 


enough 


does 


color 


easy 


truly 




making 


through 


sugar 


would 


dear 


every 


straight 



Types of misspellings. A pupil's spelling difficulty is not 
completely diagnosed when tlie words he does not spell 
correctly are located. Errors in spelling are seldom if ever 
distributed uniformly among the several letters composing 
the word. Neither does it appear that there is much uni- 
formity in the location of errors in different words. Certain 



[ 



IS* EDUCATIONAL TESTS AND MEASUREMENTS 

words are misspelled in only a few ways, while other words 
are misspelled in many ways. Certain misspellings occur fre- 
quently, while others seldom occur. In Table XVIII the mis- 
spelling of certain words found in the papers of 80 seventh- 
grade pupils are given, together with the frequency of each. 
The words were taken from column S of the Ayres Scale. 
Where no number follows the word that type of misspell- 
ing of the word occurred but once.' 

Table XV 111. The MiaepELLENGS op Eighty Seventh-Grade 
Pdpils on a C01.UMN Spelling Test 



1. affair 




sertain 




marage 


affere 








raerriage 






secrtnin 


X. 


mention 


afair (2) 


IV 


difference 




mension (9) 


aff aired 




diSerance (10) 




menflioned 


affer 




diifierencc 






n. assist 


V 


examination 




menchioD 


assit (3) 






XI. 


motion 


asiat («) 




examition 




raoshen 


ascist 








moticcm 


Basest 




excamation 






Bdsaist 








montion 


Bssceat 






neither 


assiast 


VI 


government 




neather («> 


acaiat 




goverraent (9) 




nether 


aciat (4) 








niether (2) 


accUted 




govemement 




nieghter 


assontoDt 








assised 


VII 


improvement 




oppinion (5) 


aceessisc 




improvment (7) 




opinon (e) 


atfest 








opinton 


astist 


vin. 


investigate 
investflgate (3) 




oppoinen 
oppinuia 






envesigatige 






m. cerlmn 




investiage 




opion (3) 


certian (7) 


IX 


marriage 




oponioQ (8) 


serten 




maxrage (5) 




oppion (a) 



±Knni 



w 


SPELLING 






^^ cpinnioD 


petdluar 




poasibbe 


opoin (8) 


pe^tuliar 




posiple 


opionioa 


pccticular 


XVI. 


serious 


XIV. particuiar 


pertlctura] 




cyreaus 










particuler 


peculiar 






partictuler 


pectulair 




serrioua (2) 




XV. possible 




cerioua 


pwtidu 


possableC*) 






perb»d» 


poslble 


XVII. 


stopped 




posable 




sloped (13) 


pArtiukc 


posiable (e) 




stopts 


partukr 


posaiable (5) 




atocted 


paticular 


posobile 




stop 



Teaching the pupil to correct his errors in spelling. Spell- 
ing consists in forming torrent and fixed assodationa " be- 
tween the successive letters of a word and between the 
word thus spelled and the meaning." ^ The laws governing 
the formation of fixed associations are those of habit forma- 
tion. The first step in habit formation is to get the attention 
of the child focused upon the associations to be formed. The 
second step is to secure sufficient repetition. Repetition of 
the associations is secured both through drill and through 
using the word in written expression. The pupil must give 
attention to the repetitions of the associations in order to 
insure that wrong associations will not be made. 

Numerous experiments have shown that pupils can spell 
correctly a large per cent of the words in the lists in spellers 
before they have studied thera. Because of thb fact the 
assignment of the spelling lesson should include the dicta- 
tion of the words to the pupils so that each might know 
what words he needed to study. The teacher would also 
learn what words she should emphasize in her instruction. 
' Freeman, P. N. Pgychoiogy of the Common Branches, p. 115. 




136 EDUCATIONAL TESTS AND MEASUREMEN': 

Some writers state that a. pupil should not be permitted 
to spell a word incorrectly when it can be avoided, and for 
this reason pupils should leam to spell words correctly 
before they are required to write them. Just how important 
it is to do this we do not know. In certain cases it appears 
that a child or an adult learns to spell certain words cor- 
rectly by having his attention directed to his errors. The 
fact of his error serves to direct his attention to learning to 
spell the word correctly. Those who believe that evil effects 
will come from having pupils write words which they cannot 
spell correctly may direct them to omit those words which 
they think they cannot spell correctly. 

The dictation of the words in assigning the spelling les- 
son, together with the detailed testing of the pupils as 
suggested on page 131, reveak to the teacher the words 
upon which she must exercise her ability as a teacher of 
spelling. It also reveals to her the pupils to whom instruc- 
tion should be directed in the case of each word. Particular 
methods and devices by means of which the laws of habit 
formation may be fulfilled are described in books which 
deal with the teaching of spelling.* 

Causes of some misspellings. Certain associations in the 
spelling of a word appear to be more crucial than others. 
Take, for example, the word " examination," the fifth word 
in Table XVIII. The letters e-x and t-^o-n were correctly 
associated in every instance. All of the errors occurred in 
the other syllables. A study of Table XVIII shows that cer- 
tain forms of misspelling occur more frequently than others, 

' A very good chapter (vi) will be found in Freeman's Psychology of the 
Common Branches. Bee also Cook and O'Shcu, TJie Child and hU Spelling. 



SPELLING 137 

and that most of the misspellings may be attributed to cer- 
tain specific causes. Formsof misspelling such as" partiular," 
"partuler," "opinon," " impovement," "possibbe," are 
probably due to carelessness or accident. Relatively few of 
the misspellings in Table XVIII may be assigned to this 
cause. Errors of this type probably cannot be entirely ehra- 
inated from uncorrected manuscript. However, drill will 
reduce the number of such errors to a satisfactory minimum, 

A more prolific source of error is mispronunciation of the 
word by the pupil. He may have acquired this from the 
teacher, but more likely from those with whom he associates 
outside of school. Or it may have been acquired from lack 
of attention to the form of the word. Such misspellings as 
the following are probably caused by mispronunciation: 
" perticular," "particlar," " investagate," "goverment," 
" examation." 

A very striking instance of this type of spelling error and 
its cause came to the attention of the writer a few years 
ago. A man who had taught geometry for a number of years 
used the word "frustum" in a manuscript, spelling it "frus- 
trum" which agreed with his pronunciation of the word. 
This manuscript was read by a number of well-known 
mathematicians who read it critically. Only two noted the 
misspelling of the word, and one mathematician, who took 
much pride in his ability to spell correctly and who was the 
author of several textbooks, admitted that he had always 
pronounced and spelled the word "frustrum," 

Other errors listed in Table XVIII are due to certain phonic 
irregularities of the English language, for example, certain 
misspellings of "assist," "certain," "affair," "marriage," 





138 EDUCATIONAL TESTS AND MEASUREMENTS 

"motion," "neither," and "serioua." Still other errors, 
such as, "stoped," and " improvment," are due to certain 
silent letters. In a few cases it appears that the pupil was 
not acquainted with the form of the word. Such cases prob- 
ably should not be counted as errors but rather as words un- 
known. 

Good teaching of spelling. In teaching the spelling of a 
word the child's attention should be directed to the crucial 
associations. If the word is one like "government," his 
attention should be called ta the correct pronunciation. If 
it is such a word as "their," his attention should be called 
to the use of the word. To eliminate speUing errors a pupil's 
attention should be called to his particular error and he 
should be helped to remove the cause. If the cause is mis- 
pronunciation, see that he learns to pronounce the word 
correctly. If the error is due to a confusion of letters the 
pupU should be given some device to prevent this confusion. 
The following is a device which may be used for especially 
difficult words: — 

PaT-tic-u4ar 

I frequently misspell in writing compositions but now 

I am going to learn to spell it correctly. My teacher tella me that I 

do not look at tlie syllables and letters closely enough. 

I am going to do it now with. care. I see that the word has 

— ■ syllables. The first syllable is . The vowel of this 

syllable is , the first letter of the alphabet. The last sylla- 
ble ia~ and the vowel is also ■ — : The word contains 

■ ■ ■ letters, the other vowels sre and . Now that 

I have looked at the word carefully I am going to be very 

in spelling it. I am also going to be ^~ — in pronouncing it. I 
am going to remember that the vowel in the first syllable and in 

the last syllable is an . I am not going 'to pronounce those 

syllables as if the vowel were e instead of . I am going to be 



f very- 

I waat it to 1: 



about both spelling and pronouncing tbia word. I 
e correct in every . 



Thb device is used by providing the pupil who needs in- 
struction with a printed or typewritten copy.' The pupil is 
required to fill in the blank spaces correctly. This is re- 
peated until the correct associations are fixed. 

Devices for improving spelUog. The followmg device 
serves to direct the pupil to see his errors in a wholesome 
way. It has yielded very gratifying results in the Training 
School of the Kansas State Normal: ' 

When the spelling sentences or lists have been written 
each pupil is required (1) to mark each word, the spelling 
of which he doubts; (2) as far as possible he is encouraged to 
test the validity of his doubts by known means outside of 
the dictionary, finally checking up all doubted words by 
using the dictionary; and (3) he then writes all of the mis- 
spelled words, which he has thus detected, correctly spelled 
in separate lists; (4) at this point the pupils' papers are ex- 
changed, the teacher spelling all words and the pupils mark- 
ing those found to be misspelled on the papers; and finally 
(5) when the papers are returned to their owners the addi- 
tional misspelled words discovered should be added to their 
individual lists. 

The pupil's spelling is scored by the teacher on the basis 
of the correctness of his doubts as well as upon the number 
of words spelled correctly. In the absence of a scientific de- 
termination of the relative significance of spelling of words 
correctly and doubting correctly the same value is assigned to 

' Lull, Herbert G., "A Plan for Developing a Spelling CoDaciousness"; 
n EbiRenforjr School Journal, vol, IT, p. 355, 



110 EDUCATIONAL TESTS AND MEASUREMENTS 

each. The pupils are scored both for doubting words apelled 
correctly, and for not doubting words spelled incorrectly. 

Making associations automatic. Getting the pupil to 
spell a word correctly is only the first step. There must be 
attentive repetitions of the correct associations until they 
have become automatic. In this respect spelUng is similar 
to arithmetic. In the teaching of the operations of arith- 
metic drill occupies a prominent place, but in the case 
of spelling our teaching has been confined primarily to 
testing pupils. Requiring pupils to write each misspelled 
word ten or twenty times is an effort to provide practice. 
Such practice is unsatisfactory. After the first writing of the 
word the pupil probably copies. Hence the repetitions are 
not attentive. 

Practice upon words which are misspelled by a majority 
of the pupils can be secured by having them recur in the 
spelling lesson from day to day. This plan provides the same 
drill for all pupils regardless of whether they misspell the 
word or not. In this respect it is unsatisfactory. 

Courtis's spelling practice tests. In order to provide each 
pupil with the practice he needs Courtis has devised a series 
of practice tests in spelling similar to those for arithmetic. 
A lesson of the practice tests consists of a story with the 
words to be spelled printed in heavy-faced type. The pupil 
is directed to study these words. On the reverse side the 
story is printed with the spelling words omitted. A speci- 
men lesson follows. 




SPELLING 



DETROIT PUBLIC SCHOOLS 
Practice Tests in Spelling 
Lesson No. i. — A Trip to a Great City 
On the Sixth o( December we left the Green MoimtainE of Ver- 
mont to visit an uncle, who had just returned from a long voyage. 
He had acquired immense wealth by deals in leather and 
turpentine. We were glad to arrive in Chicago on the elevenOi 
for we had often dreamed of this visit. But how deceiving are 
dreams. Instead of being met at the dejxit as we expected, we 
found we did not know a single man or woman in all the great 
crowd in the station that evening. 

We had to argue with ourselves to try to understand how any- 
one could disappoint us on such an important occasion. 

Instructions: Read the paragraphs above and study the spelling 
of the words printed in heavy type until you can fill in all the 
blanks on the other side of this sheet correctly in four minutes. 

DETROIT PLTBLIC SCHOOLS 

Practice Tests in Spelling 

Legaon No. i — A Trip to a Great City 

On the of December we left the 

Green of 

to visit an uncle, who had just re- 
turned from a long 

He had acquired immense wealth by deals in 

We were glad to arrive in 

on the for we had often 

dreamed of this visit. But how 

are dreams. Instead of being met at 

the depot as we expected, we found we did not 
know a single man or woman in all the great 

crowd in the station that 

We had to with ourselves 

to try to how anyone 

could us on such an 



1 



Scores Number Tried Number Right. . 

Name Grade Boom 



142 EDUCATIONAL TESTS AND MEASUREMENTS 

Besides providing eacli pupil with the practice which he 
needs, and thus for individual progress, the tests have the 
added advantage of having each word appear in an appro- 
priate context. A definite time is allowed, and this has been 
chosen so that a pupil must be able to spell the words auto- 
matically when he does the test satisfactorily. ^^M 

QUESTIONS AND TOPICS FOR INVESTIGATION ^^M 

1. In what respects does the problem of meaauring ability in spdling 
differ (1) from the problem of measuring ability in arithmetic and (8) 
from the problem of rocBsuring ability in reading? 

2. Compare the metliod Ayrea used in selecting words for his scale with 
the method used by Buckingham in selecting words for his scaie. 
Which method is the better? W.hy? 

3. Measure the spelling ability of the pupils of your class by means of a 
timed sentence teat and then dictate the test words as separate words. 
Compare the two seta of acorea, 

4. Teachers frequently tell with pride that all but two or three of their 
pupils make a "grade of 100" on a certain test. Should the fact be a 
cause for a feeling of aatisfaction? Were the pupils really tested? 

fi. Dictate the words for the next spelling lesson before the pupils have 
studied them. Have each pupil moke a list of the words which he mis- 
Bpells and also of the particular misspellings which he has used. Direct 
the pupils to base their study upon these lists. 

0. How could you determine whether the method suggested in question 

7. Construct a series of "timed sentence spelling tests" for the ele- 
mentary school, using suitable words from the Ayres Scale. 

8. How do the scores obtamed by usbg Starch's Tests differ in meaning 
from the scores obtained by using a test made from the Ayres Scale? 

B, Why docs a test of easy words tail to give a measure of spelling ability? 

10. Why must the relative difficulty of the words of a test be known if 
accurate measures are desired? 

11. Make a study of the ways in which your pupils misspell. Also asco^ 
tain the causes for these misspellings. 

It. How can you use this information in making your teaching of spelling 
more effective? 

BIBLIOGRAPHY 

Only the most important references are given here. Additional refop- 
ences will be found in the footnotes of the chapter. 

1. Ayren's Spelling Scale. Copies may be purchased from the DiviuoD 
of Education, Russell Sage Foundation, New York City. 





SPELLING 14S 

IlErnHEVCBS; Ayres, L. P. A Measuring Se(^e for Abiliiy in SpfUing. 
(Division of EducaUoa. Russell Sage FouodatioD, New York City.) 

Sears, J. B. Spelling Efficiency in the Oakland SchooU, Report of 
the Oakland Spelling InvpsliKation of October. 1914. Oakland, Cali- 
fornia, Board of EMucaLion, 1915. (Bureau of Information, Statistics, 
and Educational Research, Oaklaud School Department, Fublicatioa 
no. 1, 1916.) 
S. Buckingkam'i Spelling Scale. The scale ia not published separaleiy, 
but TDsy be found in the monograph, ^ Spelling AbUity; Ita Meamre- 
menl arid Di^trilniHon. by B. R. Buckingham. This may be obtained 
from the Bureau of Publications. Teachers College. Columbia Uni- 
versity, New York City. (Teachers College Contributions to EMuca- 
tion, no. B9.) 

References: Lewis, E. E. " Teetiog the Spelling Abilities of Iowa 
Sdiool Children by the Buckingham Spelling Testa "; in Elementary 
SrfooiJoumo/, col. lO.pp. S56-84. (June, 1916.) 

Thorndike, Edward L. " Means of Measuring School Achieve- 
ments in Spelling": in Educational AdminiftraHon and Svpertriiion, 
vol. I. pp. 306-12. (May, 1915.) 

Tidyman, W. F. "A Deseriplive and Critical Study of Bucking- 
ham's Investigation of Spelling Efficiency "; in Educalional Adminis- 
tratiim and Supertrimm, vol. 3. pp, 290-304. (May, 1916.) 

Buckingham's Spelling Tests were used in the Survey of the Gary 
and Pre vocational Schools of New York City. See Senenteenth Annual 
Report of ike CHy Superintendent of SchooU, New York City, 1914-16. 

3. Caurtia'a Sta/idard Research Testa in Spelling. Copies may be obtained 
from S. A. Courtis. 88 Eliot Street, Detroit, Mich. 

i. SiaTch'i Spelling Teale, For these lists and the directions for giving 
thera consult Educational Measurements, (jy Daniel Starch, chap. vi. 
(The Macmillan Company.) Copies of the tests may be obtained from 
Daniel Starch. Madison. Wisconsin. 

RBrBRGNCE: Starch, Daniel. " The Measurement of Efficiency in 
Spelling, and the Overlapping of Grades in Combined Measurements 
of Beading, Writing, and Spelling "; in Journal of EdueaOonaC Pay- 
eliology, vol 6. (March, 1915.) 

GENERAL REFERENCES 

Ayres, Leonard P. The Spelling Vocahulariea of Personal and Biisinesi 
LettfTs. (Russell Sage Foundation. Bulletin.) 

Cook and O'Shea, The Child and His Spelling. 

Eouaer, David J. "The Relation of Spelling Ability to General Intel- 
ligence and to Meaning Vocabulary "; in EinnenlaTj/ 6'cAooi Journai, vol. 16, 
pp. 190-99. (December. 1915.) 

Jones. N. Franklin. Concrete Examination of Ike Material of English 
Spelling. (University of South DakoU, Bulletin, 1913.) 



144 EDUCATIONAL TESTS AND MEASUREMENTS 

Otis, Arthur S. "The Reliability of Spelling Scales, involving » 'De- 
viation Formula' for Correlation"; in School and Sociely, vol. 4, pp. 
876-83, 71ft-22, 660-56. 799-96. {October 28, November 4, 11, 18, 1916.) 

Rice. J. M. "The Futility ai the Spelling Grind"; in The Forum, vol. 
43. pp. 163-78, 409-19. 

Tidyman, W. F. "A Critical Study of Rice's Investigation of Spelling 
Efficiency"; in Pedagogical Saninary, vol. 22, pp. 391-400. (Septem- 
ber, 191B.) 

Wallin. J. E. W. Spelling ESUdemry in Relation to Age, Grade, and Scr, 
and the Question of Tramjer. (Warwick &, York, Baltimore, Md.) 




CHAPTER V 

HANDWRITING 
I. The Problem of Measuhement m Handwkitinq 

Handwriting is measured in several ways. Frequently 
the teacher watches the pupil write and passes judgment on 
one or more of such factors as position, movement, and 
apparent ease of production. Many teachers examine the 
script or specimen of handwriting, seeking to discover the 
merit of the factors which enter into its production. Other 
examinations of the specimen seek to estimate the general 
merit or quaUty of the handftTiting, ignoring those factors 
which enter into its production. These estimates are ordi- 
narily made with no specific factors in mind other than 
rather vague ideas of good appearance, beauty, legibility, a 
good business hand, and the like. These methods result in 
inaccurate and unsatisfactory measures of handwriting. 

Another method of measuring. The problem of measm«- 
ment in handwriting differs fundamentally from that of any 
of the school subjects treated in the preceding chapters. 
The answer to an example or the spelling of a word is either 
right or wrong. Typical specimens of handwriting cannot 
be definitely classified as either legible or illegible. Instead 
there are degrees of legibility and these cannot be easily 
defined. A specimen of handwriting which is read with 
great difficulty, if at all, by one person, b read with con- 
siderable ease by another. Thus, legibility depends in part 
upon the reader as well as upon the form of the handwriting. 



4 



146 EDUCATIONAL TESTS AND MEASUREMENTS 

Many features of form, such as the size of the letters, the 
form of the letters, the spacing of words, the kind of pencil 
or pen used, and the like, affect legibility. Legibility con- 
cerns the reader. From the standpoint of the writer the 
speed and ease of production are the significant features. 
The problem of measurement will be treated in more detail 
under the topics of speed and of quality. 

II. Handwriting Scales 
Measuring speed. Speed is measured by requiring the 
pupil to write for a specified time under standard conditions, 
and then counting the letters written. The speed is stated 
! in terms of the number of letters written in a minute, or the 
average time of writing one letter. When measuring speed 
a two or three-minute period should be allowed, taking the 
time by means of a stop-watch or the second-hand of an 
ordinary watch. Obviously the teacher should see that 
all pupils are well provided with good pen-points, ink, and 
paper, unless they use pencils, in which case there should be 
n sufficient supply of well-sharpened pencib. The directions 
to the pupils should be given with due precautions against 
misunderstanding . 

Pupils should be asked to WTite a suitable selection which 
they have memorized. To guard against lapses of memory, 
the pupils should be asked to repeat in concert the selection 
to be used. If convenient it is well to provide each pupil 
with a printed or typewritten copy of the selection. When 
this cannot be done the selection may be written on the 
blackboard where all can see it. The selection should contain 
no words which the pupib cannot spell readily. It is well to 



w 

I have 1 



HANDWRITING 147 



have them practice writing the more difficult words before 
the test is begun. Do not use material which the pupils must 
compose as they WTite, for this would be worthless in testing. 
The rate of writing unfamiliar material from a printed copy ^ 
will vary with the pupil's rate of reading and so will not give 
a true measure of speed. Dictated material should be used 
only when the teacher wishes to control the speed, not 
when speed is to be measured. 

Selections for the speed tests. Different investigators 
have required pupils to write different material. Several 
have used the first line or the first stanza of the poem " Mary 
had a little lamb." "' Sing a song of sixpence " has been used. 
Other sentences which have furnished copy are: "JoUy 
kings bring gifts while happy maids dance." "A quick 
brown fox jumps over the lazy dog." ^ " Then the carelessly 
dressed gentleman stepped lightly into Warren's carriage 
and held out a small card. John vanished behind the bushes 
and the carriage moved along down the driveway." ° In the 
Cleveland Sm-vey the first three sentences of Lincoln's 
Gettysburg Address were written, and Ayres has used this 
selection in the " Gettysburg Edition " of his scale. In sev- 
eral surveys the pupils were allowed to write any familiar 
stanza of a poem. The chief principles to bear in mind in 
selecting materials are: first, to use material in the lower . 
grades which will not furnish difficulties in spelling and 
remembering; and second, to use material which will be 
uniform in all classes which are to be compared. 

' This seotente was used in securing specimens for tte Frecroau Scale. 
It contains all nf the Icttem of the nlpbabct. 

' These sentences were used in securing the specimens (or the Thomdike 
Scale. 



148 EDUCATIONAL TESTS AND MEASUREMENTS | 

The following directions are representative of many which 
have been used with good results: — ■ 

■' Write the stanza of the poem which you have learned. When 
you have written the stanza, write it again, and keep on writing 
until I tell you to stop. Write as well as you can and as fast as 
you can. Write on one side o{ the paper. When you fill one 
page, use another. Place your pajjer in position and see that 
your pen and ink are ready. When I say 'ready,' ink your 
pen and place your hand in position to write, but do not begin 
until I say, 'Start.' When I say, 'Stop,' all stop at once and raise 
your hands so I can see that you have stopped. Remember: 
Fast work and good work. Ready! Start! " At the end of three 
minutes, " Stop! " 

Measuring qualily ; use of scales. The " quality " of hand- 
writing is generally measured by means of a scale, which 
consists of a number of specimens of handwriting arranged 
in order of merit or legibUity, The scales in most general 
use are the ones constructed by ThorndUce ' and by Ayres.' 

Tkomdike constructed his scale on the basis of three 
characteristics — beauty, legibility, and general merit. The 
degree of these characteristics represented in the specimens 
of the scale was determined by the consensus of opinion of 
competent judges. Ayres constructed his scale on the basis 
of legibility alone. He defined legibility in terms of ease of 
reading. That specimen was defined as most legible which 
was read most easily. The numerical values of the speci- 
mens of the Thorndike Scale range from 4 to 18, one or 

' Thorndike, E. L. "HandnTiting"; in Teachera College Record, vol. i, 
no. 3. (March, 1910.) 

' Ayres,J..P. A Scalefor MearuTinslhe Handarriiing of Sekool-CkUdren, 
{Russell Sage Foundation, Bulletin do. 113.) Ayres has also constructed 
an Adult Scale and the " Gettysbuig Edition." In this book the tenu, 
Ayres'a Scale, refers to the " Three Slant Edition " unless otbeiwia^ 



cwia^ J 



r 



HANDWRITING 
Quality II. 



• Quality 9. 

fry iWtaaA *uut'fr«*i |»':itJwVKfl (ii/i-in<t"*^'*«.*»utt- 

Quality 8. 

■•■^ffvjiji- QJle^-^.a eXffXjJV,, Tio. <iruJV^ljJW, . 31 
Fio. 7. A Section of the Thohndike HANDWBirraa Scale 

{Btduced t id lin.) Qiulity B of thii icnle 19 approiimstely equal la qunlily 40 of tt 
1 Ayni Snile. Qiulit; 11 ii belter thui tbe Arrci SO. Su Ubie a[ rEklLve viluei. o 



150 EDUCATIONAL TESTS AND MEASIIBEMENIS 



40 













r HANDWRITING 1£1 ^^M 


50 




Jm</v^ ^ ^-M^tLoi, a^rn^JU dJ^J, 


JMjiiAjM/ict pupJU- Jvii of- /yviM- 




1 AtB£B HaNBWRITINO ScaLE 
ttcnwit [he gnidel Mtun rf tbo «aU. 


J 



152 EDUCATIONAL TESTS AND MEASUREMEN 

more specimens being given for each degree of quality. 
section of the Thorndike Scale is shown in Fig. 7. 

Ayres's Scale consists of three types of specimens, verti- 
cal, semi-slant, and full slant. Each of these three types is 
represented by eight degrees of quality to which are as- 
signed the numerical values 20, 30, 40, up to 90. In using 
this scale it must be remembered that these values are not 
the same as the per cents used in reporting " grades." A 
section of the Ayres Scale is reproduced in Fig. 8. 

Ayres^ later devised a scale from specimens of handwriting 
written by adults. Trained judges used the "Three Slant 
Edition " in selecting the specimens and in determining their 
values. This "Adult Scale" is similar to the "Three Slant 
Edition" in its general plan. Very recently (1917) Ayres 
devised a third scale, the " Gettysburg Edition." This scale 
differs from the others in the following particulars. It has 
one specimen for each step. The specimens are written on 
ruled paper. The copy is the same for all specimens. In 
addition to the standardized specimens of the Scale, this 
edition has directions for securing specimens from a class 
and for scoring these specimens. It also furnishes standards 
for speed and quality of handwriting for the grades above 
the fourth. Ayres asserts that the purpose of these changes 
is "to increase the reliability of measurements of hand- 
writing." 

Johnson and Stone ' have made a scale similar in general 
plan to the Ayres and Thorndike Scales, but based on several 

' Ayres, L. P. A Scah for Meaminng the Handwriting of AduUn. (Rusaell 
Sage Foundation. Bulletin E 138.) 

• Johnson, George L.. and Stone, C. R. "Menauring the Quality of 
Handwriting"; In The Ekmeniary Sehaal Journal, February, 1B18. 



HANDWRITING 



w 

P factors, including movement and a detailed analysis of 
I legibility. Each specimen of the scale is accompanied by a 
legend which states its defects and merits in terms of the 
analysis appended, which includes seven factors: — letter 
formation, uniformity of slant, uniformity of alignment, 
spacing, quahty of line, size, and degree of slant. 

Breed and Downs, ' offer a handwriting scale obtained from 
a survey of the handwriting of the public schoob of High- 
land Park, Michigan. The specimens collected were scored 
I by using the Thomdike Scale. Specimens were then selected 
for a five-step scale for each of the following grades, 3d A, 
|l 3d B, 4th A, 5th A and 6th A. Values are assigned to these 
I' steps in terms of the values on the Thomdihe Scale. A 
standard for speed is ^ven for each grade. This scale f ur- 
L uishes an excellent example of what may be done in the con- 
' stmcting of a scale for local use from specimens collected 
'' from the schools concerned. Beyond this the scale has 
I nothing which makes it a rival of the other scales for general 

use. 
, Freeman's * Scale differs from the other scales in an im- 

portant respect. It is in reality five scales, one for each of 
the following characteristics of handwriting: uniformity of 
slant, uniformity of alignment, quality of line, letter forma- 
tion, and spacing. These five scales are now printed on one 
sheet of paper or chart, and each scale Is called a divbion. 

■ Brent, F. S., and Downs. E. F. "Measuring and Standardiziiig a« 
Handwriting in a School System"; in EUia^alarg Behool Joamal, vol. 17. 
(March, 1917.) 

' Freeman, F. N. The Teaching of Handwriting, (floaghlim Mifflin Com- 
pany, 1915,) Also, "An Analytical Scale for the Judging o( Handwriting"; 
it) Tlui EtetanntaTg Seboot Jourrud, vol. 15, p. 132. (April, 191S.) 



154 EDUCATIONAL TESTS AND MEASUREMENTS I 

This scale is veiy useful in the diagnosis of the handwriting 

of a pupil. 

The score card for detailed analysis. The score card re- 
presents another attack upon the problem of measurement. 
Such instrumenta as the Ayres and the Thomdikc Scales do 
not require that the user make a detailed analysis of general 
merit or quahty. The score card requires that the essential 
elements of handwriting be selected and each assigned a 
value. The score card devised by Gray • weights the value 
of each of the essential elements of handwriting so that the 
highest value which can be assigned to slant is 5, while 
spacing of letters may receive 18, neatness, 13, etc. (See 
Fig. 9.) The use of this score card by teachers in their 
grading of handwriting would undoubtedly tend to direct 
their attention to the individual needs of the pupils. So far 
there is no evidence to show that the use of the score card 
will result in more accurate measures than the use of any 
one of the scales. Some claim that the elements of hand- 
I writing have not been correctly evaluated. However, it 
has the advantage that its use trains the user in the analysis 
of handwriting. Gray well defends the device by saying 
that agriculturists have long used such score cards to secure 
very satisfactory and accurate results in judging grain and 
live stock. 

The scales classified as to use. The instruments for 
measuring quality of handwriting may be classified ac- 
cording to their use into two groups. The Thorndike, 
Ayres, Johnson -Stone, and Breed-Downs Scales are used 

' Gray, C. Trumnn. A Seore Card for the ii eiuureJiienl of Hand^ 
leriiing. (Bulletin of the Univeraity of Texas, no. 37. July, WIS.) 




HANDWRITING 



Omdt 

8ampltyuaitr... 



-- 


Pn-/«" 


5«mi 


■ 


' 


> 


• 


• 


■ 


' 


■ 


• 


10 




■= 


,. 


1. 


1 H Tin 


T 
8 

18 
(M) 


'' 






" 










"■ 


■■ 










2 Slut 




Sis 








VnUormity 

TooUtupirt 

fl. Spacing or word* 




Unllomdly 




























■ 









FlO. 0. SUNDABD SCORB CaBD FOB UEAaUBINQ HAin>WBinNa 
(DeviMd br C. T. Gnj) 



156 EDUCATIONAL TESTS AND MEASUREMENTS 

to measure general quality as a unit. The Freeman Charts 
and Ciray Score Card are used to analyze general quality 
into its factors. Each factor may be measured and the 
values placed on the several factors may be added together 
to furnish a general measure. There is no evidence that 
such a general measure is any more accurate than one 
which is more easily secured by means of one of the gen- 
eral scales. Hence it is probable that the first group of 
scales will remain in use where general measurement or sur- 
veys are desired, and that the Freeman Charts and Gray 
Score Card will be used for what we shall call diagnostic 
measurement. 

Methods of using scales. The quality of a specimen of 
handwriting is measured by comparing it with the speci- 
mens which compose the scale. Its value is the scale value 
of the specimen of the scale which it most nearly resem- 
bles. Several methods of comparing specimens with a scale 
are in vogue. When a teacher works independently the 
best method is the sorting method described by Ayres,' as 
follows : — 

The scorer sorta into separate piles all of the papers to be rated, 
putting in one pile those which he judges to be of quality 20. in 
another, those which he judges to be of quahty 30. and so on for 
all of the different qualities. He then carefully compares all of 
the papers in each pile with each other and with the samples of 
that value reproduced on the scale, so as to make sure that he has 
not included in the pile any samples that might more justly be 
assigned to the next higher or lower piles. 

Another method, which requires less time, but does not 
secure as good results as the sorting method, is the ascend- 

' Ayres, L. P. A Scale for Measuring the Quality of BandimUng of 
AduW). (Eussell Sage Poimdation. Bulletin E 138, p. 9.) 



"I 



HANDWEITING 157 

ing'descendmg method. This requires that the specimen 
being examined be moved from the lowest step on the scale 
toward the higher steps mitil the judge decides the specimen 
on the scale to be superior to the specimen in hand. Then, 
beguming at the upper end of the scale, the specimen must 
be compared with the steps of the scale until the judge de- 
cides the specimen on the scale to be inferior to the one in 
hand. The specimen in hand then receives the rating rep- 
resented by the point midway between the step of the 
scale reached in the ascending and the step reached in the 
descending series of comparisons. For example, working 
upwards on the Ayres Scale the judge stops at 70 and 
working downwards at 60. The specimen in hand would 
then be rated 65, 

This method may be varied by rating all of the specimens 
of a class by working up the scale, recording the judgments 
on the back of the specimens rated. Next, rate all the speci- 
mens working down the scale, and record the judgments on 
the face of the specimens. Finally, the judge goes over the 
specimens and establishes the midway point for each speci- 
men as the true rating. 

Whenever three or more persons can work together in 
scoring specimens the results may be expected to be more 
satisfactory than those secured by independent work. All 
the members of the group should examine the specimen of 
writing and confer concerning the rating it should receive. 
A majority of the group must agree before a score is as- 
signed to the specimen. 

A method which will require more time, but one which 
will secure more accurate results than the methods described 



158 EDUCATIONAL TESTS AND MEASUREMENTS 

above, ia one in which a group of three or more persons score 
the specimens independently, using the sorting method. 
Then the scores assigned by all of the judges to a speci- 
men are averaged and the result taken as the true score 
for that specimen. The accuracy of the resulting scores 
will increase with the size of the group of judges. All 
of these methods may be used with either the Ayres or 
Thorndike Scales, and with modifications with the other 
scales. 

The results of a number of investigations have shown that 
careful training in a relatively poor method of using a scale 
will produce a marked improvement in the accuracy of the 
scores. It should follow that careful training in the use of the 
sorting and group methods would produce highly accurate 
results. 

Measm:einent for diagnosis. In using Gray's Score Card 
and the Freeman Scale, measures of each of the several 
factors concerned in a pupil's handwriting are secured. A 
record of successive measiu-ements show the progress or de- 
cline in the general quality of the pupil's writing, and thus 
furnish a basis for class marks. But far more important, it 
will show just which abilities have not been sufficiently 
improved. These abilities will then be the points of attack 
for the teacher and pupil in their subsequent work. For 
example, a record as shown on the Gray Score Card might 
indicate that a pupil's handwriting was suffering chiefly 
because of poor letter formation. A closer inspection would 
show that letter formation was very often defective in two 
items, letters not closed and parts omitted. Such diagnosis 
reveals a definite problem for the teacher. 




HANDWRITING 159 

Use of the score card. The score card (see page 155) may 
be used for a pupil, or a class. If it is used for a pupil, 
the numerals along the top may be taken to indicate weeks, 
months, or other intervals. In the colunm under the nu- 
meral 1 the first scores of a pupil's handwriting should be 
entered. A month later a second series of scores should 
be entered in the column headed by the numeral 2. The 
next month another series of scores should be entered un- 
der numeral 3, and ao on. At the close of a term there will 
appear a very useful record of the child's experience in the 
learning of handwriting. This use of the score card Gray 
calls a clinical study. 

If the card is used for a class, the numerals at the head of 
the columns stand for the specimens written by the several 
pupils of the class. TJie totals at the bottom will fm-nish an 
interesting comparison of the ability of the pupils. Each 
pupil knowing his number can tell how he stands in rela- 
tion to the other members of the class. If a new score card 
is posted each month, a pupil may see whether he is gaining 
or losing in his position in the class. If he is losing, he will 
be inclined to seek the reason. He may see that his neatness 
has a low score. This furnishes a strong incentive for work 
to improve in neatness. Teachers and supervisors might 
compare their records. The use of the card may be varied 
by training pupils to score their own or others' handwrit- 
ii^, or by one teacher calling on another teacher to score 
the handwriting of her pupils. 

The individual record card shown in Fig. 10 is a very 
simple form of a score card designed to be used with the 
Freeman Scale. 




160 EDUCATIONAL TESTS AND MEASUREMENTS 



Pupa's Name. 






..City. 












pmtiriia 


Stand trM 


Third trial 

D"" 


£"*.™ 


? ? s 


Chart! 
(3Lu,l) 










^ 


Lt 


Chut II 














Clartlll 

(Qmdlty of ItB.) 






















; 1. 




^ta») 
















ToUl (tiIub tin 
Free'm™ Bel.) 












QuiJltj (islm cm 
Ay™ ScBle) 












*(L.lUl.|Wrmln«.) 













FlO. 10. iHDIVIDnAl. R&COBD C^BD, FbBEK 



The Freeman Scale.* The first of the five divisions of the 
Freeman Scale represents three degrees of uniformity of 
slant. In using this division, as in using the nest division, 
judgments will be made more easily if a slant and alignment 
gauge is used.' The second division represents uniformity 
of alignment. The user must be careful to note that letters 
which are close together show deviations in alignment more 
prominently than letters written farther apart. 

' DeacribedmorefulIybyFreemanm r«acAinif<t/ffondim(!nif. (Hougb- 
tou Mi fflin Company, 1914.) 

' Freeman, P. N. The Teaching of Handmriling, p, 151. The slant gauge 
consists of three rows of parallel lines. The lines in one row are vertical 
andineachottheolherrowathelineaarosct atauaifonaalant. Thcaiiga- 
ment gauge confiista of one straight line four or five inches long. These 
lines may be drawn on transparent paper and placed over a specimen of 
handwriting to assist in determining the deviations from uuitormity in 
alftot and alignment. 



HANDWRITING 161 

The third division shows the quality of line or stroke.. 
A reading-glass will aid in judging with this division. The 
fourth division is intended to measure letter formation. 
Freeman describes eight illegible forms of letters which 
should be counted as errors. Two principles should control 
here: first, whatever slant or type of script the pupil may 
use, consistency to that choice should be maintained; and 
second, no letter should vary from its recognized form so/ 
much as to be easily mistaken for another letter. The fifth 
division shows different kinds of spacing. Letters may be 
crowded or spread too far. The same applies to words. 

In each division the three degrees of excellence are given 
scores of 1, 3, and 5 respectively. The intermediate values 
of 2 and 4> may aJso be used. If the old edition of the scale 
is used, the scores assigned to the specimens of letter forma- 
tion are 2, 6, and 10. Freeman suggests that the specimens 
be scored by using the score for letter formation as placed 
on the new edition of the chart, and then doubling these 
scores in making up the total score. 

Using the Freeman Scale. This scale may be used for 
measuring specimens from all members of a class, but fre- 
quently it is used to measiire specimens written by those 
ranking conspicuously below tlie average ability or below 
the standard abihty of the class. This needy group of 
pupils may be selected by the teacher's unaided judgment, 
but preferably by the use of the Thorndike or Ayres Scales. 

Freeman ^ has recently issued the following suggestion 
for using his scale : — 

' Freeman, F. N. Experimenlal Educat, 
Company, 1916.) 



I 



I 




16€ EDUCATIONAL TESTS AND MEASUREMENTS 

The specimen to be judged is graded according to each category 
separately and given the rank of the specimen in the chart with 
which it most nearly corresponds in euch case. The total rank is 
calculated by summing up the five individual ranks. Thus, if 
letter formation is given double value, the lowest possible rank is 6 
and the highest possible rank is 30 (5 + 5 + S + 10 + 5), and 
the range is H. 

Several precautions are to be observed in making the judgments. 
The value of the method rests upon the fact that different features 
of the writing are singled out. one at a. time, and graded by being 
given a rank in one of only three steps. The differences between 
the steps are marked, and the ease of placing a specimen should be 
correspondingly easy. 

This method implies, however, that 

(!) The attention is fixed on only one characteristic at a time. 

(2) The judgment on one point be not allowed to influence the 
judgment on the other point. 

(3) The same fault be counted only once. 

(4) General impreaaions be disregarded. 

The scores secured by means of the Freeman Scale should 
he saved to furnish a means of evaluating the results secured 
from instruction. The scores may be recorded on the speci- 
men, or better on an individual record card, such as shown 
m Fig. 10. The latter will be more convenient when the 
teacher wishes to examine a series of scores recorded at 
intervals over a term of several months. 

III. The Keli-^ilitt or Measureb and Scales 
Rate and quality contrasted. The measure of a pupU's 
rate of writing in terms of the number of letters written per 
minute is definite. . It is an accurate measure of the rate at 
which the pupil wrote when the specimen was secured. It is a 
true measure of the pupil's normal rate of writing unless 
the pupil wTote slower or more rapidly than he is accustomed 
to write. Thus in securing specimens care should be taken 



HANDWRITING 16S 1 

to liave the pupils write as nearly at their normal rate as 
possible. 

If we cannot be sure that pupils are writing at their nor- 
mal speed, we can at least use the standardized instructions 
for collecting specimens of handwriting. This will insure 
that the speed at which the pupils write is influenced by the 
instructions given to them in the same degree as the speed 
was influenced in establishing the standard. The importance 
of this safeguard can easily be established if pupils be 
directed to write a copy as rapidly as they can write it for 
one minute and after an interval be again asked to write the 
same copy as well as they can. There will in almost every 
case be a wide difference in the rates at which they write. 

The measurement of quality is different. Some teachers 
are skeptical of the measures of the quality of handwriting 
because they believe the scales have not been accurately 
■ constructed. It is easy for any teacher to criticize the scales, 
but it is very difficult for even an expert to improve on them. 
The construction of a perfect handwriting scale is a task 
which we must leave to the expert. Meanwhile we shall do 
well if we make full use of the available instruments. Other 
teachers are willing to accept the construction of the scales 
but doubt the accuracy of the scores obtained by using 
them. These may be answered by the work of Kelly,' 
Lewis,* and Gray,' who have shown that with equal train- 

' Kelly. F. J. Teachers' Maries, pp. OO-IOB. (Teachers CoUege Contri- 
butiona to Edutntiim, no. 66, 1914.) 

' Lewis. E. E. "The Present Standard of Handwriting in Iowa Normal 
Training High Schools": in School Administration and Supervision, vol. 1, 
no. 10, pp. 683-71. (December, 191S,) 

■ Gray, C, T. "The Training of Judgment in the Use of the Ayre* 
Scale for Handwriting"; in Journal qJ EducaUomU Prychology, " ' 
try, iei5. p. SS. 



le Ayre* I 

"J 



164 EDUCATIONAL TESTS AND MEASUREMENTS 

ing in the usual per cent method of grading and in meas- 
uring with the aid of a scale, the results of the latter method 
are more accurate. Lewis asserts that the use of a scale re- 
duces the variation almost one half over the ordinary per 
cent method. Kelly makes this significant statement: 
" Teachers have reduced the variability shown in the per 
cent method by practice at the expense of the children, 
while they have at the same time decreased their capacity 
for effective use of the standard scale." 

Accuracy of the scores. The accuracy of scores may be 
considered in either or both of two ways. If one person 
scores a set of specimens and then, after an interval, scores 
them again, his scores may be judged to be accurate or 
inaccurate as they agree or disagree. This lack of agree- 
ment is called individual agreement or individual varia- 
tion. Again several persons may score the same set of spec- 
imens, working independently. When their several scores 
for the same specimen are compared the amount of their 
agreement is called the group agreement or group variation. 

Several investigators ^ have attacked the question of the 
accuracy of the scores secured by the use of the scales and 
have found that there is a wide variation in the scores 

' Breed, F. S., ond Culp, Vernon, "An Application ftnd Critique of 
the Ayrcs Handwriling Scale"; in Scliool and Society, vol. 2, pp. 36-47. 
(October, 1916.) 

Manuel, H.T. "The Use of an Objective Scale for Grading Hand writing"; 
in The Elementary School Journal, vol. 15. no. 5, p. 269. (Januaty, 1915.) 

King, Irving, and Johnson, Harry, "The Writing Abilities of the Ele- 
mentary and Grammar School Pupils of a City School System Measured 
by the Ayrea Scale"; in Journal of Ed-ucalional Psychology, vol. 8, pp, 
614-20. 

Harvey. Nathan A. "The Use of Handwriting Scales " ; in The Amtri- 
can Schootmasler, October, 1916, p, SCI. 



HANDWRITING 165 

when several judges score the same specimens. These re- 
sults appear discouraging, but it is significant that these 
investigators do not condemn the use of the scales as their 
results would seem to warrant. On the contrary, there ap- 
pears a faith that scales will be used successfully in the 
future. This is a confession tliat the results secured have a 
limited application. 

Training in using the scales. The teacher's ability to 
make comparisons depends on technique and training. The 
technique for using scales has been described under the cap- 
tion of " Methods of Using the Scales." It is significant that 
none of the investigators referred to used what seems to 
be the best technique in rating specimens of handwriting. 
Neither did they use well-trained judges. The two must 
go together. A teacher might make Inaccurate comparisons 
while using a good method, if poorly trained in that method. 
The most accurate results will follow from the teacher using 
the best method after being well trained in the use of that 
method. Furthermore, there are several methods of train- 
ing. 

Thomdike • has proposed a method of training judges in 
the use of the handwriting scale. He furnishes fifty speci- 
mens of handwriting whose value has been determined. 
The teacher scores each specimen, without referring to its 
true value. She then compares her score with the true value 
and notes her error. This is done with each of the fifty speci- 
mens. After scoring the fifty specimens once, they are to 
be taken up in random order and all scored again as before. 




ting"; ^^J 



166 EDUCATIONAL TESTS AND MEASUEEMENTS 

Each time they are scored there will be some gain in the ac- 
curacy of the scores. Hurt '■ found that three weeks of this 
training enabled seventeen of twenty-one judges to bring 
theb individual variation within usable limits. 

Hurt carefully tested the effects of several methods of 
training in the use of the Tbomdike scale. He has shown 
that individual independent practice in the use of the scale 
reduces the individual variation of several ratings of one 
set of specimens. It was also shown that after two groups 
of judges had had several weeks of practice, the group which 
practiced two weeks longer succeeded in still further re- 
ducing their individual variation or variations with their 
own previous ratings. It seems probable that an individual 
can make his ratings more consistent by a long period of 
independent practice. But such consistency may not make 
one judge's ratings agree any better with the ratings of 
other judges if other judges are not equally well trained. 

Hurt used another method of training judges: One group 
of five judges had practice with informal instruction in the 
use of the scale for one month. The average of a small 
group of scores was token as the true measure of a specimen. 
These judges rated the specimens independently, but con- 
ferred between ratings. These conferences and the in- 
struction given were directed towards the reduction of 
group variation, hence they did not tend to make a judge 
any more consistent with his own previous ratings. Neither 
did the training have a lasting effect. Some judges ac- 
quired a greater proficiency in the use of the scale than others. 



HANDWRITING 16T 

This method of traming appears to be of doubtful value fc 
the teacher. 

Gray ^ tested the effect of practice with instruction in the 
use of the Ayres Scale, He gave three judges careful 
struction and practice for twenty weeks. At the end of ten 
weeks the group variation was a httle more than half of 
what it was the first week. The twentieth week found their 
variation reduced to one seventh of the first week's varia- 
tion. Gray saya: — 

Accuracy in gradmg writing by a scale may be produced by 
careful training in the use of the scale. In the past the assumption 
has been made that ability to grade expertly in a subject cdme with 
an expert knowledge of the subject. While the experiment does 
not disprove this aaaumplion, it indicates clearly that another 
avenue of approach to such expert ability b through a period of 
careful training. This implies that grading may be considered a 
field more or less by itself, and gives a glimpse of a type of work 
in education whose chief interest is the accurate use oj units of meaa- 
vremenl.* 

Relative values of the different scales. In comparing the 
Thomdike and Ayres Scales for the reliability of scores se- 
cured, Pinter ° finds the Thomdike Scale superior. Starch 
finds their reliability to be about equal, and Freeman asserts 
that the Ayres Scale is sUghtly more reliable. Johnson* found 
the Thomdike Scale to be better for use in the second and 
third and possibly in the fourth grades, but considered the 



• Pinter, R. " A Comparison of the Ayres and Tbonidike Handwriting 
Scales"; in JourTial 0/ EducaiiamU Psychology, vol. 9, pp. 525-31. (No- 
vember, 1914.) 

* Jubnsun, Joipeph Henry. "A Comparison of the Ayres and Thomdike 
Handwriting Scales": in North Carolina High Sehool Bulletin, vol. 7, pp. 
170-73. (October, 1»16.) 






168 EDUCATIONAL TESTS AND MEASTIHEMENTS 

Ayres Scale more reliable for use in the grades above the 
fourth. Freeman ' finds that his analytical scale secures 
results which are more reliable than the Ayres Scale. These 
studies are not and do not assume to be conclusive. So it 
seems fair to conclude that about the same variability may 
be expected in the use of these three scales. 

IV. Standabd Scobes 
f Freeman's proposed standards. After measurmg the hand- 
writing of a class and thus learning how well the pupils 
write, the teacher very naturally wishes to know how well 
they should be expected to write. Several sets of standards 
of attainment are available to answer this demand. These 
standards are given in terms of median speed of production 
and median quality. To compare the scores of a class with 
these standard scores the teacher will need to find the median 
of that class. 

Freeman ^ proposed the standards given in Table XIX. 
The medians for speed are expressed in terms of the number 
of letters written in one minute. The medians for quality 
are given in terms of the Ayres Scale. 

Table XIX. Standabdb pkopobbd bt Freeman 





SchBOigrada 




n 


III 


IV 


" 


VI 


Vil 


vm 


Quality 




4T 

48 


60 
56 


5S 
65 


59 

72 


64 
80 


70 







' Preemnn, F. N. Expmmenfo^ Education, p. 92. (Houghton Mifflin Com- 
pany, iai6.) 

' Freeman, F. N. In the Fourteenik Yearbook of the National Socid}/ for 
Ike Study of Edueation, part i, chap. v. 



m 

I To m 



HANDWItlTING 



To make these standards for quality intelligible to those 
ho have scores secured by the use of the Freeman or Thorn- 
dike Scales, these standards are translated into terms of 
the latter scales in Table XX. 






Table XX. Relative Vai 



J^reeman or 1 horn- 

:ited into terms of ^^^ 

V Three X>iffbbenT^^^| 





..^^. 1 




// 


III 


IV 


" 


VI 


VII 


VIII 


Ayres , 

Freeman 

Thomdike. , . . 


44 

17. B 
B.36 


47 
18.4 

9.76 


60 
10 
10.13 


eo 

10.76 


S9 
20.8 
11. S4 


64 
11.89 


70 
23 

12.60 



Table XIX would read thus : A second-grade class should 
have a median score for speed of 36 letters per minute, and 
a median score for quality of 44, when scored by the Ayrea 
Scale. Table XX reads, for quality alone, thus: A second- 
grade class should have a median score of 44 when scored 
by the Ayres Scale, 17.9 when scored by the Freeman Scale, 
and 9.36 when scored by the Thomdike Scale. 

In the " Gettysburg Edition " of his Scale, Ayres has given 
standards in the form of distributions of scores for the sev- 
eral grades. These distributions show what we may expect 

' The scores for the Freeman Scale are taken from graphs snd tables 
given m his The Teaching of Bandicriting. The scores tor the Thorndi'ke 
Scale are taken from The Meamremeni of Efficiency in fUading, ff'ril- 
ing, SpeUiTig, and English, (D. Starch, Madison, Wisconsin, 1914.) For 
other comparisons, see "Comparable Measures of Hand writing," hy L. 
8. Sackett, in School and Society. October SI, 1916; Joseph Henry John- 
son, "A Comparison of the Ayres and Thomdike Handwriting Scales"; 
in North Carolina High School Bulletin, 7 : nih-rS. October, 1916. The 
la given in these tables may not be statistically accurate, but 
I* accurate aa the present sUtus of measurement demands. 



170 EDUCATIONAL TESTS AND MEASUREMENTS' 

to find, but they do not necessarily show what we should 
find in an efficient school system. Adherence to the doctrine 
that the average of present practice constitutes a standard 
will not cany us far in improvement. Table XXJ shows the 
averages which Ayres gives for the "Gettysburg Edition." 



Tabu XXI. Standabd Scores r 
Edition " 


OB TH 


5 "Geo^bbubg 




Ontia 




II 


111 


IV 


V 


VI 


Vll 


VIU 






tA 


46 
58 


60 
64 


70 


58 
76 

















What ttiese standards represent Standards of attain- 
ment are determined by two considerations: first, they must 
be attainable by pupils under ordinary school conditions, 
and without the expenditure of an unreasonable amount 
of time and effort; second, they should be high enough to 
assure that the pupil will have sufficient skill in writing to 
meet the demands which will be made upon him. These con- 
siderations are emphasized by the facts that only a limited 
amount of time is available for the teaching of handwriting 
ui tlie ordinary school, and ttiat after practice has progressed 
for a time it does not bring as large returns as it did in its 
initial period. 

The first of these considerations has been met by esamining 
the handwTiting of thousands of children, gathered from all 
parts of our country. Freeman used the results of the scoring 
of about five thousand specimens from each of the seven 



^^^P HANDWRITING 

grades. These specimens were selectetl from a larger number 
of specimens which were collected in fifty-six large cities 
of the United States. He found that the average of the 
scores of the upper half of these specimens gave scores for 
speed and quality which are approximately the standards 
he proposes. In checking up the second consideration, Free- 
man mvestigated the demands which are made upon those 
who are employed in several large commercial houses. The 
returns from this investigation, together with the results of 
the other investigation, indicated that the standards as pro- 
posed are but little more than the minimum essentials. 
Moreover, Freeman estimates on good evidence that these 
standards can be attained with an expenditure of not over 
seventy-five minutes a week. 

Other evidence as to standards. Table XXII and Table 

XXin give the results of a Dumber of widely-scattered 

Table XXII. Median Scores found (Speed) 




«eA«JGrad« 


ft 


n 


Itl 


IV 


" 


VI 


Fit 


,,„ 


Cl 1 .!> 


iio.e 
as 


1" 


t 


so 


70 




83' 


4J40 


loWUKhDol.' 

BUrch'i BKmd»rd> ■ 


> Judd.ClurinH. M«m.ri«,lkrWo,tafll,.PMira^hyU.. (RcpMl. Survey Commitle. 

> Asfabiiiigb. E. J. Himdtnling of I mix Rclml Ciildrn. (Univerdt; of Ia»a. Eitcniidn 
Divuioa. Bulletin no. IS, Murch, IBIfl.) 

1 Sun^, D. Thi MMtwwmml .tt EM^^l ■- lU^iiw. Wrilim. Spdli^,, and En,li,h. 
(Univenlty oF Wuconiin, mi.) 

ari,. (Kan«.Stiit<;Norn«lScliwl. Enipnri».IC.™v) 

1 Fr«m.B, F. N. F,mH«mll, n-rl^d, q/(A. N^imd S«.ri, far (*. Sl^J„ ^ fiiuoilion. 
[Hitl. (lltU.) 



172 EDUCATIONAL TESTS AND MEASUREMENTS 



Table XXETT. Median Scohes foijnd (Qualitv) 





«.w»„.„ 


?' 




" 


n, 


/r 


" 


„ 


t-// 


!■//■/ 


Smlf 


m 


Clersbnd' 


si:: 

8.2 


sals 

42 

10.3 
0.75 


45.8 
8.8 


1' 

8.9 
10:76 


!■■ 


1 


70' 


AjTB 


!ffi,3S7 


awch'.8Uiiri.rti'.. 

Fltty-«iicltlM' '.'..'.'. 

(AyrW Bale) 
Salt Uke Oilj • 

TteemAi,', StaiidHrdi 
(TliomdikB'j. Scile) 


34^000 

1,40{. 

i,«n 



' Judd, Cburlea H. tfauurin; (^ Ifoi'l qT (Ac FubUc 
iJctce of the Gevcliind FounditiDD, IBlfl.) 

< AibbBUgh. E. J. SawlKWIiii^ qrlouaSctMJCj^iUrci 
ioD Diviiion, BuIL^Iq no. It, Horch, IBIS.) 

> Stircb, D. TA; Jfiiuursmini tTi^IRncmi' <<> Awfiii 
Cnivemty of Wucaniin. 1914.} 

* DcViiH, J. C Seauul Annwil Aijwrl itT Buwau iif 
Ilandantt. (Ksiuu StitF Nocnul Srbaal, Empoiis, Kaiu 

> Pmnun, P. N. FtM"M«lK ria>4» 
ion. part I. (ISI8.| RcvUed mEdisni u 
locislv/w ItcSluJv 4/ eiucutiDn, put I, (IS1«.J 

' Report a} a Sarmi ij Iht Sdiait a! SrJi LaU Citu.TJtah. (lOlS.) 
' &)»rl ri/'s Sunsii VIAe Sdvxit^ BiMe, Moni., cluip. ii. (1814.) 



Writint, Spetlins. and Enilish. 



"Allth 



in EiBJfun^rv School Teacher. 



inveatigationa, and show the median scores found in these 
different places. The Freeman standards are inserted in 
each table for comparison. The figures in the columns at 
the extreme right show the total number of specimens 
rated in. each investigation. A comparison of the results 
shown in these tables with the standards proposed by Free- 
man (Table XIX), shows that the standards are higher 



HANDWRITING 

in most of the cases. This, together with other evidence, 
points toward a possible modification of the standards set 
by I 

Standards required for work. Ayres ' and Ashbaugh ' 
have drawn certain conclusions from the requirements in 
haodwriting which are set up by the examiners of the 
Municipal Civil Service Commission of New York City. 
Ashbaugh quotes a letter from the Acting Director of the 
commission as follows: — 

I find that the Municipal Civil Service Commission of New 
York ordinarily uses the standard of 70 per cent as a passing 
grade in handwriting, but for positions where handwriting is a 
special requirement the standard is sometimes set at 75 per cent. 

Ayres has shown that the ratings of 70 per cent and 
75 per cent, as given by the commission, correspond respec- 
tively to scores of 40 and 50 on the Ayres Scale. Since this 
commission recommends many persons who cannot write 
better than the 40 specimen of the Ayres Scale, and recom- 
mends others who WTite only as well as the 50 specimen, 
for positions where handwriting is a special requirement, 
it would follow that an ability to write as well as 50 on the 
Ayres Scale would be sufficient for all the demands which 
many pupils will meet. 

There is another obvious demand on the pupil's ability 
to write. This is the demand made by the high schools and 
colleges. We have but little data on this point, but many 
come to high schools unable to write rapidly enough for the 

' Ayres, L. P. A Scale for Meaaiiriitg the QualU]/ of Handwriting qf 
AduUt. (Russell Sage FoundatiuD. Bulletin E 138.) 

' Ashbaugh, Ernest J. Handwriting of ioica School Children. (BuDetin 
ol tite Univeraity of Iowa, March 1, 1916.) 



174 EDUCATIONAL TESTS AND MEASUREMENTS 

demands placed upon them. They then often sacrifice the 
quality of their handwriting for the sake of greater speed. 
Lewis ^ examined the handwriting of 17B0 third- and 
fourth-year students of 168 Iowa Normal Traming High 
Schools. He found their median score for quality to be 59.1 
on the Ayres Scale, with a range from 34 to 89. Fifty per cent 
of the scores fell between 53.6 and 64.3. The average speed of 
their handwriting was 90 letters per minute. Thus they rank 
with the seventh-grade standard for quality, and the eighth- 
grade standard for speed. Comparing their scores with those 
of many eighth-grade children, as shown in Tables XXH 
and XXin, these high -school pupils write from ten to fifteen 
letters per minute faster, but no better than the average 
eighth-grade pupil. These data bear out the statement that 
the higher schools require greater speed of handwriting than 
the training of the elementary schools have furnished, 

V. The Teaching Situation REVEALEn 

The point of view which arises when we measure hand- 
writing and set up standards .such as those proposed brings 
out the significance of certain teaching situations. First, 
it is apparent that handwriting is a very complex ability; 
second, the teacher of handwTiting must see her problem in 
terms of individuals and not in terms of classes; and, third, 
these individuals will differ widely in their abilities, in their 
consequent needs, and in other respects. A brief considera- 
tion of these points will make their significance apparent. 

' Lewis, E. E. "The Prcfent Standard of Handwriting in Iowa N'onnal 
Training High Schools"; in Ejlucaiional Adminiatraiion and Supervition, 
ToL 1, pp. 883-71. (December, IB1J(.) 



HANDWRITING 17« 

Handwriting a complex ability. The complexity of hand- 
writinf? ability is seen from two points of view. If we watch 
tiie pupil write we see that there are several factors called 
position, movement, ease of production, etc., and that 
each of these is reducible to other more elemental factors 
which are specific muscular adjustments with their attendant 
conscious processes. When we examine a specimen of hand- 
writing a similar complexity is apparent. Legibility, quality, 
and speed of production are not simple, but complex. Each 
in turn may be analyzed into factors which are finally depen- 
dent on muscular adjustments and conscious processes, 
similar to or identical with those reached by analysis from 
the former point of view. The muscidar adjustment, with 
its attendant conscious process which results from cither 
analysis, constitutes a si>ecific habit or specific ability which 
must be the point of attack for the t«aeher. 

An individual rather than a class problem. The children 
which the teacher has before her are not grouped on the 
basis of their needs in handwriting, but for other adminis- 
trative and economic reasons. When the needs of a class are 
analyzed it is found that each individual has an unique 
equipment of special abilities. Two individuals may receive 
the same mark or score on their specimen of handwriting, 
but one specimen may show the need of training along one 
line of abilities and the other show the need of training of 
another set of abilities. Hence a general prescription of 
drills for a class must be very wasteful, since some mem- 
bers will inevitably be fixing habits which are undesirable, 
and failing to develop the abilities which they need. The 
teacher has the problem of prescribing those exercises and 



J 



176 EDUCATIONAL TESTS AND MEASUREMENTS 

engendering those ideas of form which are needed by each 
individual. 

Children differ widely in abilities and needs. Kot only 
will children differ in their individual equipments of specific 
abilities, but they will differ ja their manner of responding to 
instruction and in their rate of developing through practice. 
This makes it necessary for a teacher to have not one but 
several remedies for each shortcoming which is encoun- 
tered. It also makes it inevitable that after a term of prac- 
tice and instruction, pupils will still be found to vary widely 
in their ability to write. Since this is inevitable and due to 
a characteristic of human nature, the teacher should not 
seek to secure liuiformity of abihties. But the teacher should 
be concerned that the majority of the class shall have passed 
a certain milestone in their development, and that every 
individual not suffering under an unavoidable handicap 
should make some advance in his ability to write. 

Plotting scores, and reading their meaning. The scores 
secured by measuring the handwriting of a class can be 
made to reveal valuable information for the teacher. Figs, 
11, 12, and 13 show typical conditions. These figures show 
the distributions of scores in speed and accuracy for three 
classes. Fig. 11 represents the distribution of scores for 
a third-grade class. The numerals along the bottom of the 
figure denote quality on the Ayres Scale, and speed in terms 
of letters written in one minute. The numerals along the 
side indicate the number of pui)il3. A perpendicular solid 
line shows the location of the median for the class, and a 
perpendicular broken line shows the location of the stand- 
ard for that grade. This same general explanation will fit 
Figs. 12 and 13. 



^ 



HANDWBmNG 



177 



IS 

10 

12 
9 
6 
8 






• 

1' 



80 80 40 GO 
Quality 




10 80 80 40 GO 60 70 



FlO. 11. ^OWDfO THX DiBTBIBUnON OF ScOBXB IN 

HANDWBimia OF A Thibd-Gbadb Class 

The Kae 11 indJotw the iMduui aoote for the dui, the line S the itaadeid lor 

thedaM. 




iS 
I 



M 




2080. 40 G06070208040 60 60 7080 90 100 
Qoalitr Speed 

Fio. 12. Showino thb Distbibution of Scobbs in Handwbitinq or 

A Foubth-Gbadb Class 

The Kne H indiaitci the median More for the daai, the line S the itandard for 

thedaM. 



18 

Iff 

18 

9 

9 
8 
D 



S 



l9 



40 50 60 70 80 90 100 
Speed 



20 80 40 60 60 70 
QaaSity 

Fig. 13. Showing thb Distribution of Scores in Hand- 
writing of a Fifth-Grade Class 



The line M indicates the median score for the daai, the line S the itandard for 

theclana. 



178 EDUCATIONAL TESTS AND MEASUEEMENTS 

Of the conditions revealed in these figures, four types are 
significant for the teacher. These are low class- medians, a 
) wide range of scores, low scores for individual pupils, and 
' high scores for individual pupils. Low class-medians and 
low individual scores refer to median and individual scores 
which are lower than the proposed standards for the 
grades under consideration. High individual scores are 
probably not signiGcant until they are higher than the 
standard for the eighth grade. A wide range of scores means 
that the pupils have not all profited equally by the instruc- 
tion received. When the range is very wide it means that 
some pupils are either defective or suffering from neglect. 

The scores for the third and fourth grades which are given 
in Figs. 11 and Vi show that the class-medians are low with 
the exception of the median for speed in Fig. 12. Assum- 
ing that the pupils should make a fairly steady development 
in their skill in handwriting from the second through the 
eighth grade, these low medians mean that the pupils in 
these classes are a year or more behind in their develop- 
ment. Either they must make very rapid progress in some 
one or more years, or they will leave the elementary school 
inadequately equipped in liandwriting. When the class- 
medians are as near to the standards as those shown in 
the graph for the fifth-grade class, it is probable that the 
class can, by a little extra effort, reach the standard in a 
short time. 

The graph for the scores in speed for the third-grade 
class, and both graphs for the fourth-grade class, show wide 
ranges of scores. The range of the speed scores for the fourth 
grade is obviously most unsatisfactory. In this case more 



HANDWRITING 179 

than a third of the class have scores as high as the standard 
for the seventh grade, and nearly as many have scores as 
low as the second-grade standard. This evidence suggests 
that the instruction in handwriting has had practically no 
effect on the speed of handwTiting of some pupils, while other 
pupils in the class have learned to write at more than stand- 
ard speed. If the instruction is given with a definite aim in 
view and is effective, the class will not show such distribu- 
tion as that of the fourth-grade class. In such cases as 
those revealed by wide distributions of scores the examioa' 
tion of individual scores will suggest the remedy. 

When individual scores are examined several classes of 
individuals wOl be discovered. Some will have scores up 
to the standards in both speed and quality. These individ- 
uals need give the teacher no concern, unless there may be 
some who have very high scores in both speed and quality. 
These may be excused from further drill in handftTitmg if 
this time can be used for other activities.' The remaining 
pupils are those having low scores in speed and high scores 
in quality, those having low scores in quality and high scores 
in speed, and those having low scores in both speed and 
quality. All of the pupils included under these three heads 
need special attention in the form of diagnostic measure- 
ment, as described above, and such instruction as is re- 
quired to meet the needs revealed by this diagnosis. 

Successive measurements to reveal progress. Successive 
measurements will reveal the progress which has been made 
by aelass during a given period. Table XXIV shows theprog- 

' See Freeman, F. N. "Handwriting"; in SixUeTtih Yearbook of the 
Natitmal Socitty for tht Study of Edveation, part 1. 



180 EDUCATIONAL TESTS AND MEASUREMENTS 

ress made by each of five classes in one city system. Speed 
is given in letters per minute, and quality in scores on the 
Ayres Scale. These classes were measured in September, in 
November, and in January, The gain is shown at the bot- 
tom of each column. The fifth grade shows a type of prog- 
ress in which there is a considerable gain in both speed and 
quality during each interval, the last scores being approxi- 
mately up to the standards. The sixth grade shows another 
type of gain in which the first interval gave a considerable 
gain in quality with but little gain in speed. The second 
interval showed a loss in quality but a marked progress in 
speed. 

Table XXTV, Progkebs n* Handwriting in One CiTr 





























GradH 






// 


III 


IV 




ri 




1 


I 


1 


1 


I 


I 


I 


I 


I 


! 




i 


39 


!4 


U 


aa 


36 


40 


ar 


60 
84 


•w 










Amount of (Bin - 
Novembor to 


" 


Total gain.. 


38 


" 


M 


« 


26 


* 


2fi 


17 


W 


" 




Meetii^ the situation revealed. The measurement of the 
ability of pupils to WTite reveals situations which demand 
that the teacher be resourceful in finding methods and devices 
which will remedy the shortcomings which have been shown 



HANDWRITING 181 

to exist. These methods and devices should be selected in 
the light of facts which have been established by investi- 
gations of the learning process/ as it occurs in learning to 
write. 

Systems of penmanship. There are not sufficient data 
from comparative studies of different penmanship systems 
to establish any single system as superior to others in its 
effectiveness to secure results in terms of speed and quality 
of handwriting. 

Movement in handwriting. Graves ° describes three kinds 
of movement; first, finger movement; second, "arm move- 
ment" in which there is some movement of the fingers and 
considerable movement of the arms; and, third, free-arm 
movement, in which "the respective movements of the 
fingers and of the arm are proportionally equal in amount." 
Of these three tj-pes of movement Graves concludes that 
arm movement seems to give greater speed, Nutt = does not 
find this to be so. Since Nutt secured a positive result of 
no correlation between speed and movement, and since 
his measuring devices and methods were more objective than 
those used by Graves, it seems safe to conclude that move- 
ment does not influence speed in writing for a short time. 
The apparent greater ease of production of arm or muscular 

' No atlempt is made to review or to criticiie the material which appears 
ID QumerDUB manuala of handwriting. Much excelleat materioJ which ep- 
peoTS in The Ttaching of Handwriting, by Freeman, is not even mentioned, 
because of lack of space. The difficulty of confining this discussion to the 
octu^ facts discovered through mensurement of handwriting will be ap- 
parent. 

' Gravea, S. Monroe. "AStiidy of Handwriting";in/ounMio/jEyuen- 
Honal Ptychology, vol. vii, p. 486. (October, 1916.) 

' Nutt, H. W. "Rhythm in Handwriting "; in Elementary Sekaot Journal, 
■nA. 17, Ki. 48«-*5. 



^ 



182 EDUCATIONAL TESTS AND MEASUREMENTS 

movement may result in greater speed if speed is measured 
during a long period of writing, 

Nutt also found that arm movement comes with age and 
motor development. None of the systems of penmanship 
were found to develop any appreciable amount of arm 
movement in younger children. Copy book nietliods and 
the teacher's emphasis on arm movement develop about the 
same degree of arm movement in ages ten to fourteen. 
Special supervisors seciu-e mare arm naovement in children 
of these ages, and also in nine-year-olds. Well developed 
arm movement did not produce better quality than move- 
ments in which the arm was moved but Lttle, Neither did 
well developed arm movement show greater speed. Neither 
does better arm movement result in an increase in rhythm. 
The child's natural rhythm o£ motion is an important factor 
in his learning to write. 

Rhythm. The rhythmic quality of the movement in- 
creases with age, but has no relation to amount of arm move- 
ment or to the quality of the wTiting. Nutt found that speed 
of writing and rhythm increase together. That is, children 
who score high in rhythm also score high in speed, are older 
than the other children, but may not use arm movement or 
produce a better quality of handwriting than other children. 

Speed. Both Nutt and Graves have shown that speed 
increases with age. Nutt shows that speed increases with an 
increase in the rhythmic character of the movement. An 
important factor in the production of speed of handwrit- 
ing is that of hand position. Graves says that a free and easy 
or loose-handed position is most conducive to speed. There 
is some evidence that girls write more rapidly than boys. 



■r 

1 Oualitr 



HANDWHTTING 



Quality and speed. Several studies have sought for a 
latiou between speed and quality of handwriting. In the 
Cleveland Survey ' it was found that " in general speed and 
quality vary inversely. But there is a middle series of 
speeds and qualities where improvement in one does not 
seem to interfere with the other." That is, outside of the 
limits which are approximately those of the proposed stand- 
ards, efforts to secure an unusual degree of quality will 
reduce the speed, and vice versa. Several investigations of 
adults' handwriting show that they tend to increase the speed 
and reduce the quality. A general view of the results bear- 
ing on this point shows that the children who write a good 
quality on the average write as rapidly as those who wTite a 
poorer quality. This seems to be due to the natural rhythm 
of the children. If this rhythm is forced or disturbed unduly 
the quality suffers. Thorndike's results indicate that caus- 
ing a pupil to write more slowly than his normal rate did not 
improve the quality of the handwriting. Nutt showed that 
increase of rhythm tends to slur over the abrupt strokes, 
while an increase in speed tends to slur the difficult junc- 
tions of strokes. Freeman says the so-called muscular move- 
ment produces a firmness and evenness of line and a regu- 
larity of slant. 

General laws of learning applied. The ability to write 
well is a habit, hence the laws of habit formation apply to. 
the acquisition of this ability. 

The first essential factor is a right start. The pupil must 
have a clear view of the habit to be acquired. This may 
mean a definite idea of the movement to be executed, or a 

' Judd, C. n. Measuring the Work of the Public Schools, pp. S 



184 EDUCATIONAL TESTS AND MEASUREMENTS 

picture of the letters or series of letters which are to be made. 
The start must be made with a strong initiative. Sometimes 
the pupil must be shocked into a desire to correct a fault of 
his handwriting. 

The second essential is that of attentive repetitions. 
The repetitions or drills should be strongly motivated. All 
investigations of habit formation agree upon this point. 
The periods of practice are most efficient if not carried to 
the point of fatigue, hence, for the lower grades Freeman 
suggests frequent ten-minute periods of practice. In no 
grades should the periods be longer than twenty minutes. 

The third step, as often stated, is, "Allow no exceptions to 
occur." If a pupil practices correct form in the penmanship 
class for ten minutes, and then uses poor form in a spelling 
class for the same length of time, the latter exercise will tend 
to cancel the effects of his practice in the penmanship class. 
In view of the frequent occurrence of such conditions as 
these, special periods, during a part or all of a term, may 
be set aside for intensive penmanship study and practice. 

A fourth step is the repetition of the habit until it is well 
fixed. This means that the repetitions must extend beyond 
the point of apparent completion to permanent automatism. 
After this stage is reached incentives should be found which 
will raise the habit from the level of mere automatism to 
higher levels of skill. 

Devices of remedial instruction. A few devices are given 
which are not usually found in the manuals, and which 
meet the specific needs revealed by measurements. Other 
devices may be found in the manuals issued by the repre- 
sentatives of the several systems of penmanship. 



"I 



HANDWRITING 185 

iicreasing speed. When a pupil habitually writes slower 
thaa the standard it will be well to force this pupil to write 
at standard speed. The influence of the increased speed can 
then be observed. If the teacher uses music as an aid to 
rhythm, the faster time of the music may increase the rate 
of a pupil's handwriting, but to insure this, the speed of 
handwriting should be carefully measured by accurate 
timing and actual count of the letters, A dictated exercise 
will accomplish the result more surely, and with economy 
of time and effort for the teacher. For example, the sen- 
tence, "The quick brown fox jumps over the lazy dog" 
contains thirty-five letters, 

a 4 tnin. 



Bth-grade pupils should write this II times 

7th 

tith 

Sth 



Sd 



30 « 



10 



The pupils should memorize the sentence and write it sev- 
eral times for practice and for spelling. The teacher should 
then time their writing, according to the table given above. 
Those who do not write the required number of letters in the 
allotted time should be told t:o write faster, until they have 
done the test successfully. 

These time intervals are calculated to meet the require- 
ments for speed as furnished by the Freeman standards. 
Another device which is even more convenient is represented 
by the following example. This is a dictation exercise for 
the sixth grade. The teacher should direct the class to be 
ready to write, then, watching the second hand of her watch. 



186 EDUCATIONAL TESTS AND MEASUREMENT 

untU it is at 60, start to dictate. A little preliminary prac- 
tice will make it easy to dictate the words so that they will 
be pronounced as indicated. For example, the teacher should 
be pronouncing the word " care " just before the second hand 
reaches the ten second mark, etc. 

5 10 20 30 

Do you take care to keep your teeth very clean, by wash- 
40 50 60 

ing them without failing every morning and after every 
15 20 30 40 30 

meal? This is very necessary both to preserve your teeth 

60 10 20 

a great while, and to save you a great deal of pain. (Stop.) 

Developing rhythm. If this exercise reveals a serious sacri- 
fice in the quality, or if the pupil's handwTiting cannot be 
brought up to standard speed, we may consider that the 
pupil's rhythm has not developed to the place where it will 
sustain tbb sjjeed. Since we do not know which is the 
primary factor, rhythm or speed, the best procedure would 
be to seek to develop both. Rhythm may be increased 
by the use of music. Careful attention to the securing of a 
free, well-relaxed hand position will aid in securing speed. 
Sometimes a careful analysis of letter forms will reveal that 
the student is forming some letters in a way that makes 
speed impossible. In such cases new forms of those letters 
should be taught. Sometimes the written work in classes 
other than the penmanship class places unusual demands 
on the pupil's powers. Throughout the lower grades written 
work, which places demands on the pupil's speed which are 
different from those of the writing lessons, should be avoided. 

In seeking to improve the quality of a pupil's handwriting. 



HANDWRITING 187 

analysis should be made by means of the Freeman Scale or 
the Gray Score Card. After these analyses sometimes it is 
well to call the pupil's attention to a single letter or combina- 
tion, and make this vivid by means of the warnings or sug- 
gestions which are most effective. When a single fault is 
corrected another may be attacked, until the pupil acquires 
this power to correct his faults. 

Motivating practice. A number of devices and plans have 
been proposed for the motivation of practice in correcting 
faults in quahty of handwriting. Wilson • gives the result of 
an interesting experiment in which the Thomdike Scale was 
used in such a way that the students coiUd follow their own 
progress in handwriting. In this case each student was com- 
peting with his own record. Several have constructed scales 
from the specimens collected in a school or class. These 
scales may be constructed by rating the specimens with any 
one or more of the scales described. Superintendent Bliss, of 
the Montclair, New Jersey, schools is quoted by Wilson as 
follows: " A scale made from the writing of pupils makes a 
stronger appeal than either the Ayres or Thomdike Scales." 

Charters ' recommends a " writing hospital " to which the 
poor writers are sent until they are properly convalescent. 
This hospital is a special penmanship class. Stone ^ has a 
plan which puts all the pupils of a school in four groups for 
their writing lessons. These are groups 1, 3, 3, and the ex- 

' Wilson and Wilson, The Motii!<UionofSchoolWork,p.l87. (Houghton 
Mifflin Company. 1916.) 

' Charters, W. W. Teaching the Common Branches. (Houghton Mifflin 
Company. 1913.) 

' Stone, C. R. "Motivation of the Formal Writing I*3son Through a 
Special Classification of Pupils for Writing"; in ScAacf and Home Education, 
June, 1915. 



188 EDUCATIONAL TESTS AND MEASUREMENTS 

cused groups. The special feature of this plan is that at 
stated intervals members of a lower group are allowed to 
challenge members of a higher group, and a contest for the 
coveted place ensues. 

Many special devices for naotivation are in use. Pupils 
write letters ordering supplies for the school, or they write 
invitations to school parties, pageants, etc. Some pupils 
write letters for the teacher or principal. 

Reasons for using handwriting scales. Even when meas- 
ures of handnTiting are not accurate they force the teacher 
to give attention to the specific faults and needs of tlie 
pupils. This measurement creates a critical and scientific 
attitude in the teacher toward the outcomes of instruction. 
This attitude tends to remove the attention from personal 
bias and feeling to an objective consideration of the results 
secured. Measurement of handwriting also banishes the 
old false standards represented by the perfect specimens 
which were produced from an engraved plate. In their 
stead are proposed some standards which are within the 
reach of a majority of the pupils. Thus many children can 
know the joy which comes from achieving something recog- 
nized to be of value. In addition to these values measure- 
ment is destined to become scientifically accurate and thus 
furnish a vaUd basis for instruction. 



QUESTIONS AND TOPICS FOR INVESTIGATION 



1. A teacher may judge the handwriting of her class by watching the 
pupils while they write or by examining the specitnena which they 
have written. Which is the better method if the purpose is to make 
coiDparisona of classes? Which ia better for discovering the hand- 
writing defects of individual pupils? What factors would you keep in 



4 





HANDWRITING 189 

mind in watching children while they write? What factors in the other 
method? 

2. Ask a class to write the three seoleoces from Lbcob's Gettysburg 
addreaa. Direct them to sturt tuf^ther and write as rapidly oa thej' 
c&D for one mmute. At the end of one minute slup them and direct 
them to record theb speed of handwriting. Then &sk them to Iwgin 
again and write for one minute writing oa 'meU aa they caq. If you wish 
to eliminate practice effects, repeat the cjqjeriment again reveraing 
the order of the directions. Note the different effects due to the nature 
of the directions. 

3. Use the dictation exercises given ou page 1S5. First dictate at the 
staJidard rate for the class. Next dictate at the standard rate for a 
class two grades lower. Then dictate at the standard rate for a class 
two grades higher. Examine the specimens using a standardized scale 

. and note the effects of the different rates of writing on the quaUly. 

4. Try several different selections for copy when collecting specimens 
and determine the selection which your pupils will write moat rapidly. 

fi. In what situations would you use the Ayres Scale? Thorndike Scale? 
Johnson-Stone Scale? Freeman Scale? Gray score card? 

6. What factor in handwriting is of most importance according to the 
Gray score card? Wiiat factor is of least importance? Why are these 
factors so rated? 

7. Select ten or preferably one hundred specimens of handwriting and 
rate them every day for several days by means of the scale you have. 
Keep the record of your day's rating but do not use them to help you 
in making future ratings. After several ratings note the cousistency 
of your ratings. 

8. Use the Gray Score Card (or Freeman Scale) in scoring the poorer 
apecimens of handwriting. Prescribe the drills you would use in coi^ 
reding these defects. Compare this with the recomraendationa of 
other teachers or students. Trj- your prescription on the pupils con- 
cerned if possible. 

9. For what purpose would you use the dictation exercises? 

10. Use the Gray Score Card by fiUing it ^th scores secured from use 
of the Freeman Scale, whenever they apply. 

11. Select a defect of letter formation frequently found in a pupil's hand- 
writmg. Direct the pupil's attention to this defect and challenge him 
to correct it. Direct that a record be taken as follows: U the defect 
were found in letter " a " inatruct the pupil to count the number of 
such errors to be found in fifty consecutive " a " a aa they occur in his 
handwriting written prior to the time you pointed out the defect. 
After a. period of practice, direct the pupil to make another counting 
from his handwriting written at some period other than tbe writing 



lUO EDUCATIONAL TESTS AND MEASUREMENTS 

BIBLIOGRAPHY 

n here. Additional references 



I. Scales aud Repebences 
1. Agret't featuring Scale for Handicriiing. Copies may be obtnined 
FromDivisiuDuf Education, Ruasell Sage F(iun(ia.tion, NewYurk City. 

Refebenceb: Ayrea, L. P. A Scale for ileaguriTig the Quality of 
Handmriiirtg of ChiMmn. 

Second Annual Report of Bureau of Edueational Meaiuremenl» and 
Standard!, 1915-16. (Kansas State Normal School.) 

Ashbaugh, E. J. Handwriting of Iowa School Children. (Univer- 
eity of Iowa, Extension Division, Bulletin no. 15.) 

King, Irving, and Johnson, Harry. *' The Writing Abilities of the Gle~ 
mentary and Gramuiar School Pupib of a City School System Meas- 
ured by the Ayrea Scale"; in Journal of Ediualional Ptychology, 
vol. 3, pp. 51i-ii0. (November, 1918.) 
S. Ayrea't Scale for Measuring the Quality of Handwriting of Adulti. 
Copies may be obtained from the Divbion of Education, Russell Sage 
FoundatJoQ, New York City. 

Refbrbncb; Ayres, L. P. A Scale for Measuring the Quality of 
Handwriting of AdvUt. (BuUetin E 138, Division of Education, Rus- 
sell Sage Foundation. New York City.) 
3. Freeman' a Handwriting Scale. Copies may be obtained from Houghton 
Mifflin Company. 

Rbfehences: Freeman, F. N. The Teaching of Handwriting. 

Freeman, F. N. "An Analytical Scale for Handwriting"; in Ele- 
merdary School Journal, January, 1915. 
i. Gray'i Score Card for the Meam-remenl o] Handieriting. Copies may 
be obtained from C. T. Gray, University of Texas, Austin, Texas. 

Refebence: ^ Score Card for the Measurement of Handwriting. 
(Bulletin no. 37, University of Texas.) 

5. Joktuon-Slane Scale. Copies not obtainable. 

Repehence: Johnson, George L„ and Stone, C. R. "Measuring 
the Quality of Handn'riting"; in EUmentary School Journal, Febru- 
ary. 1916. 

6. Thomdike't Scale for Measuring the Handwriting of Children in Gradet 
B to S. Copies may be obtained from the Bureau of Publications, 
Teachers College, Columbia University, New York City. 

Eefbbbnces; Thomdike, E. L. "Handwriting"; in Teacheri Col- 
lege Record. March, 1910. 

Thomdike, E. L. "Teachers" Eatimales of the Quality of Speci- 
mens of Handwriting"; in Teachers College Record, vol. 15, no. 5. 
(November, 1914.) 



"! 




HANDWRITING 191 

7. Ajnres's ** Gettysburg Edition " copies may be obtained from the 
Division of Education, Russell Sage Foundation, New York City. 

n. Genebal References 

Breed, F. S., and Down, E. F. "Measuring and Standardizing Hand- 
writing of a School System"; in EUmerUary School Journal, vol. 17, no. 7, 
p. 470. (March, 1917.) 

Freeman, Frank N. "Handwriting Tests for Use in School Surveys"; 
in Elementary School Journal, vol. 16, pp. 299-301. (February, 1916.) 

Freeman, Frank N. "Handwriting"; in The Fourteenth Yearbook of the 
National Society for the Study of Education, part i. University of Chicago 
Press, 1915, chap. v. Also in the Sixteenth Yearbook of the National So- 
ciety for the Study of Education, part i. (1917.) 

Graves, S. Monroe. "A Study in Handwriting"; in Journal of Educor 
tional Psychology, vol. 7, pp. 483-94. (October, 1916.) 

Johnson, Joseph Henry. A Comparison of the Ayres and Thomdike 
Handwriting Scales, Containing a table of equivalent values in the two 
scales. (North Carolina High School bulletin, vol. 7, pp. 170-73. October, 
1916.) 

Manuel, Herschel T. "Studies in Handwriting"; in School and Society, 
vol. 10, no. 116. (March 17, 1917.) Gives the frequency of letters and ele- 
ments of letters in writing as shown by the Ayres Spelling List. Suggests 
certain advantageous changes in forms of letters. 

Nutt, H. W. "Rhythm in Handwriting"; in Elementary School Journal, 
vol. 17, pp. 432-45. 



CHAPTER VI 



LANGUAGE 



I. The Problem of Measurement in Laj4guaoe 
One of measuring specific habits. Language functions 
in the communication of those things which are commonly 
called ideas and feelings by means of words. The choice 
and arrangement of the words give language its form. In 
written language spelhng and handwriting contribute addi- 
tional elements of form. The ideas and feelings which lan- 
guage comraunicates may be described as its content. 

The rules of grammar definitely prescribe many items of 
form. For example, a verb must agree with its subject in 
number and person; pronouns are inflected for person, case, 
gender, and number; verbs are inflected for mode, tense, 
person, and number; certain words must be capitalized. A 
pupil's control of those items of form which are definite 
and which occur frequently must be reduced to the plane 
of habit or automatic functioning, so that his attention 
may be focused upon the content which he is attempting 
to express. For these abilities the problem of measurement 
is the problem of measuring specific habits, and in this re- 
spect it is similar to the problem of measurement in the 
subjects treated in the preceding chapters. 

Rhetoric treats of the choice of words and the structure 
of sentences and paragraphs, but it does not prescribe de- 
finite objective standards for them. The quality of these 
features of form is detennined by the eSect of the Ian- 




LANGUAGE 193 



guage upon the reader, and this effect is not the same for 
all readers. However, rhetoric does furnish certain general 
principles which are useful to the pupil in guiding hb con- 
struction of a form which will attain his purpose. Here the 
problem of measiu^ment is different and more difficult. The 
functioning of the principles cannot be reduced to the 
plane of habit, because it is necessary that they function 
in a variety of new situations. 

The content of language is subtle and is not objective, 
except as it is given a form. It depends upon the vividness 
and the organization of ideas, and upon the wealth of asso- 
ciations which give the central ideas their setting. These 
features of content are expressed through the choice of words 
and the structure of sentences and paragraphs. In this 
way content and form are so int imately connected that aside 
from the features of form which are specified by the rules 
of grammar, any attempt to measure one b made diffii 
by the presence of the other. 

The instruments for measuring ability in language may 
be divided into two classes. The composition scales and 
Trabue"s completion-test language scales arc instruments 
for general measurement of language ability. The grammar 
scales and the copying test measure specific features of form. 



n. The Measurement of Ability in Englisq 

Composition 

For measuring compositions written by school-children 
scales similar in plan to the handwriting scales have been 
devised. The first of these scales is the Hillegas Composi- 
tion Scale. A revision of this scale has been published with 



icul^^H 

I 

d 



vim I 



194 EDUCATIONAL TESTS AND MEASUREMENTS 

the title. Preliminary Extenaiaa of the HiUegas Scale, by 
E. L. Thomdike. Other scales are: ^ The Harvard-Newton 
Composition Scale, devised by F, W, Ballou; A Scale for 
Mea£uring the General Merit of English Composition in the 
Sixth Grade, devised by F. S. Breed and F. W. Frostic; 
A Scale for Measuring Written English Composition, de- 
vised by M. H. Willing; and the Nassau County Supple- 
ment to the Hillegas Scale, devised by M. R. Trabne. Each 
of these scales consists of compositions arranged in the 
order of their merit. The relative merit of the compositions 
was determined by means of careful statistical studies.' 
1. The Hillegas Scale. This consists of ten compositions 

• SlHoiiard Tests in English, by S, A. Courtis, are not included in tlib 
liat. These tests were a group of tests in reading and language aad proved 
to be so cunihersonie to use that tlieir publication has been discontinued. 

' No attempt is made in this chapter to give the methods employed in 
deriving these scales nor to aummarize the criticisms which have been made 
upon the methods. For these the reader is referred to the following: — 

Hillegas, M. B. "A Scale for the Measurement of Quality In En^ish 
Composition for Young People"; in. Teachers College Record, vol. IS, no. 4. 
(September, 1918.) 

Ballou, F. W. "Scales for the Measurement of English Compositions"; 
in HaTvaTd-Nevtton Bulletin, no. 2. (Harvard University.) 

Kayfelz, Isidore. " A CrilicalStudy of the HiUegas Composition Scale"; 
in Pedagogical Seminary, voL 81, pp. 869-77. (December, 1915.) 

Kayfeti, Isidore. " A Critical Study of the Harvard-Newton Composition 
Scales"; in Pedagogicid Seminary, vol. 8S, pp. 325-47. (September, 1916.) 

Brownell, Baker. "A Test of the Ballou ixaie of English Composition"; 
in School and Socidy, vol. i, pp. 9S8-42. 

Breed, F. S., and Frostic, F. W. "A Scale tor Measuring the General 
Merit of English Composition"; in Elementary School Journal, vol. 17, 
pp. 3a7-M. 

Willing, M. H. Meaauremenl of Written Engliih Compoiition in the Pub- 
lie Elementary Sehoola of Denver, Colorado, (Master's Thesis, Chicago.) 
Bee also Report of the School Surrey of School District Number One in the 
City and County of Denver, part n, p. 59. 

Trabue, M. R. "Supplementing the Hillegas Scale"; in Teachen College 
Record, vot IS, p. 31. (January. 1917.) 




LANGUAGE 

rEmging from an artificial production whose scale value is 
zero to the tenth composition whose scale value is 9.3. 
Three of the ten compositions are artificial productions, five 
were written by high-school pupils, and the remaining two 
by college freshmen. No two were written on the same 
topic and they vary greatly in length and type. Each de- 
gree of merit is represented by only one composition. In 
the Thorndike Extension of the Hillegas Scale only a few 
of the compositions of the original scale have been used and 
several compositions are given for each degree of merit in 
the middle of the scale. Twenty-nine compositions repre- 
sent fifteen degrees of merit within approximately the same 
range as the original scale. This makes a more finely di- 
vided scale than the original one, 

2. The Harvard-Newton Scale. The Harvard-Newton 
Composijion Scale consists of four separate scales, one for 
each form of discourse; argumentation, description, exposi- 
tion, and narration. Each of the scales consists of six com- 
positions WTitten by eighth-grade pupils and arranged in 
order of merit as determined by the marks as^gned by 
teachers rating them as eighth-grade compositions. For 
each composition there is given a statement of the most 
significant m.erits and defects. 

3- The Breed and Frostic Scale. The compositions used 
by Breed and Frostic in deriving their scale were WTitten 
by sixth-grade pupils under uniform conditions. A part of 
a story called "The Picnic " was read to the class and they 
were given twenty minutes to complete it. The method of 
selecting compositions for the scale and determining scale 
values was similar to that employed by Hillegas. 




196 EDUCATIONAL TESTS AND MEASUREMENTS 

4, Willing's Scale. Willing used compoaitions written by 
pupils in grades four to eight on the topic, "An Exciting 
Experience." Several particular exciting experiences were 
suggested, and twenty minutes were allowed for writing. 
In determining the compositions to be used for the scale, 
"all errors in spelling, punctuation, capitalization and 
grammar were counted and corrected." The relative merit 
of the corrected compositions was determined and those 
compositions were selected for the scale which had the 
same rank in "story value" and frequency of errors. The 
scale is reproduced on pages 206-10, 

5. The Nassau County Supplement. The Nassau County 
Supplement to the Hillegaa Scale consists of nine compo- 
sitions, seven of which were written by elementary school 
pupils on the topic, "^Tiat I should like to do next Satur- 
day." The compositions of the scale were carefully se- 
lected and evaluated by an elaborate method which cannot 
be even sketched here. 

Reliability of measurements. These scales are to be 
used in the same way as the handwriting scales, In meas- 
uring the ability of pupils in the field of English composi- 
, tion, the first step is to secure compositions written under 
defined conditions. After the compositions are obtained the 
merit of each is measured by comparing it with the compo- 
sitions which make up the scale used afld its degree of merit 
is that of the scale composition which it most closely re- 
sembles. Assuming that the degree of merit of the scale 
compositions has been accurately determined, the ac- 
curacy of the measures obtained depends upon the relia- 
bility of the comparison. 



LANGUAGE 197 

The accuracy of measurements made by using the 
Hillegas Scale has been investigated by having the same 
compositions rated by a group of teachers, first, by the 
usual method and second, by using the scale. By means 
of a number of such investigations the conclusion has been 
reached that " the variability is somewhat greater with the 
scale than without it." ' However, in the investigations 
reported the teachers using the scale were untrained in 
its use. Furthermore, as in the case of handwriting prac- 
tically no attention has been given to determining the best 
methods of using a composition scale. Because of these 
two facts the conclusions reached in the studies just re- 
ferred to must be qualified. Thomdike has asserted that 
errors in using the scale will diminish with practice' and 
with sufficient practice they will be smaller than the er- 
rors now made by teachers in grading paragraph writing 
for general merit. Trabue states that " In spite of all criti- 
cisms of and objections to the Hillegas Scale, the fact remains 
that it is one of the most useful measuring instruments in the 
whole field of education." ' 

An investigation of the reliability of the measurements 
made with the Harvard -Newton Scale has been reported.* 

» Kelly, F. J. Teachers' hfaTka. p. 134. 

• Thorndike, E. L. "N'otes on the Significanpe and Use of the Hillegas 
Scale for Measuring the Quality of English Compoaition"; la Engliih 
Journat, Fol. 2, p. 651. 

' Trabue, M. R. "Supplementing the Hillegas Scale"'; b Teachera Col- 
lege Record, vol. 18, p. 51. 

* "Second Annuo! Conference on Educational Measurements"; Indiana 
UniverailyBuUetin, vol.13.no. 11, pp. 115-22. Also Hud elson. Earl. "Some 
Achievements in the Establishment of a Standard for the Measurement 
of English Compoaition in the Bloomington. Indiana, Schools"; in Engliak 
Journal, November, 1916, 



198 EDUCATIONAL TESTS AND MEASUREMENTS I 

Compositions written by 386 pupils in grades 7, 8, and 9 
were used. The variability of the marks assigned by means of 
the scale was slightly less than when the scale was not used. 

The other scales have appeared only recently and no study 
of their reliability has been reported. Since the Thomdike 
Extension of the Hillegas Scale is more finely divided and the 
degrees of merit are represented by more than one composi- 
tion it is possible that it will yield more reliable measures 
than the original scale. The seales devised by Trabue, 
Willing, and Breed and Frostic consist of compositions 
written on the same topic and under known conditions. 
When used to measure the merit of compositions written 
on the same topic and under the same conditions it seems 
probable that any one of these scales would yield a more 
reliable measure of a pupil's ability in wTitten expression. 
At least the procedure is more scientific. 

Use of the scales. The final test of the reliability of the 
measures obtained by using these scales must be based 
upon their use by teachers who have been trained in using 
them. The results of studies based on handwriting scales 
suggest that practice in the use of a composition scale will 
materially increase the reliability of measures. In the ab- 
sence of a scientifically determined plan of training, the 
plan given for handwriting (page 165) maybe used. Even 
if it is shown that composition scales have no value as in- 
struments for measuring the merit of compositions, it does 
not follow that the scales are without value to the teacher. 
The scales represent degrees of merit in English composi- 
tions, and thus assist the teacher in setting before her 
pupils the standards toward which they sliould strive. 



' LANGUAGE 199 

Particularly is this true of the Harvard-Newton Scale. 
By pointing out the merits and defects of each composi- 
tion the pupil ia given an objective statement of what the 
teacher expects of him. 

Directions for using the Hillegas Scale. The Hillegas 
Scale has been used in the surveys of Butte, Montana, and 
Salt Lake City, Utah. At Salt Lake City the EoUowing di- 
rections were followed for securing compositions written 
by pupils: — 

I. Each teacher is requested to ask her children to write a 
composition for her on the following theme: — 

"Suppose that you have twenty dollars, which you have been 
given to spend. You have five friends, and you decide to s)>end 
it in such a manner as will give the most pleasure to each. Tell 
what you would do or buy for each friend. The amount spent for 
each friend need not be the same, but the total for the five must 
be twenty dollars." 

S. The composition should be written with pen and ink on the 
regular writing paper. 

3. After the children are ready tot writing, read the subject to 
them, give them a minute or two to ask any questions, and as soon 
aa you are sure. that the children understand what they are to do, 
start them at writing. 

4. When the children have finished collect the papers, fasten 
those for each class together with a clip, and send to the office of 
the school principal, 

Hillegas Scale Scores. Similar directions were used at 
Butte. In both instances the compositions were rated by 
the teachers, using the Hillegas Scale, but no teacher rated 
the compositions written by her own pupils. The median 
scores for the two cities and the standards proposed by 
Trabue ^ a^e ^ven in the following table: — 

' Trabue, M. R. "Supplementing the HillegOB Scale"; in Teachert Col- 
lege Record, vol. 18, p. SI. (January. 1BL7.) 



4 



800 EDUCATIONAL TESTS AND MEASBEEMENTS 



1 



„™,. 


aaliL-jKCUy 


Buu^ 


Trnhu,.- 
Mediae Kort 


lifll 

lull 




S.i 
4.4 

3.8 
3.1 




7.2 
S.B 
6.5 
0.01 
S.5 
fi.O 
4.5 
4.0 
3.5 






4 
S 
S 

e 


11 

75 
40 
87 









































• Btra^<ileSiirtrtii^thcSc>miiSxiUmicfS<illLiiliiCity.niak,p.1*i. (IBtS.) A leviied 
cdltign hu be™ pubUthed (tal«) by Tbc World Book Company, uodsr tlus tiUe, ScAiBf 
f>Tvanisdlion and AdvtiniitrtOiofi, by E. F. Cubberlcy. 

t Aiparff/n Siincir q/ lis ScW £v'<"» of BuUc, ifonliKii, p, TB. (1»I4,) A KTued e£- 
tion bu been published IIBIS) by the lame film under tbe title, Smu Fni6<«u in Cil|F Scibol 
^(^minulraluHi, byG. D. StTBfcr. Seep.ldT. 



While the median scores for Salt Lake City show a rather 
even rate of progress from grade to grade, and are dis- 
tinctly higher than the Butte median scores, a charting of 
the distribution of the scores attained in each of the grades 
(see Fig. 14) reveals a distribution all along the scale. 
Many fourth-grade children wrote better than many eighth- 
grade children, and so for all of the grades. Such charting 
clearly reveab the individual and class problems with which 
teachers and supervisors have to deal. Fig. 14 shows not 
only the range of distribution of the scores obtained by the 
pupils of each grade, but also the per cent of pupils in 
each who attained each of the possible scores and the me- 
dian score for each grade. 

Directions for using the Harvard-Newton Scale. When 



)cale ScDM M IJt LflO MS U< UB LTB T.n ajB MT 



FHtentof popUa 




208 EDUCATTONAL TESTS AND MEASUREMENTS 

using the Harvard-Newton Scale the compositions should 
be wTitten under approximately the same conditions as were 
followed in obtaining the compositions for the scale. The 
following directions prepared on this basis have been widely 
used.' 

How TO Obtain Compositions 
The compositions should be written as a part of the regular 
school work. Give two periods of forty minutes each for this 
work — ■ one for preparation and writing and one for correcting 
and copying. Use one or two days, whichever is more convenient, 
but if two days are taken, collect the papers at the end of the first 
period and distribute them the next day. The pupils may use the 
dictionary and other reference books for preparation, but not while 
writing, correctinR or copying. The compositions should be written 
in ink. The teacher should answer no questions after the writing 
begins. 

Selecting a Subject 

The teacher should suggest a subject from the following list. 

Then permit each pupil to choose his subject, either one of those 

suggested to him or one proposed by him, except that it must not 

be a. subject upon which he has written recently. 



Tlie bat irsT to at<^h nU. 

Hno to bidld h shelter [ar the luiibt. 

Tlw hind man. 

Wben I betped liitbH hirvrat. 

Shauld ■ tavn bo; own a dug? 

When Uu fauu ma aoay. 

My Ont bushull game. 
Should r«]th*ll be iboliibedF 

I. — better than f (Fill in tli 

Stimild a bor be made la go to icboolP 
The use at keeping pet tahbiti. 
It it right Id catch flsb with a hookr 
Sliauld a bay play marliles fur keepq? 




H.-n lo fiet the bffit brtaktu 
When the Uimp Upped over. 
Whii^h il betL^r in the hoax. 
Oiiglit bofi ud girli Ulu p 


1 1 know, 
muiic or uti 


My firil psrtj-. 

Wh™ Mr. ■■ 

The .ant boy I 


house buiued. 
ever knew. 


ail: 
WbHl a bird Mine into the. 
How I kiiled giint LaiyMS. 

■n»-™ideoln^-d«.k. 
The boy iFtDU froDi mc. 


n«r 








BOYS AND 


GIRLS 




Wbtolbwome. voter. 
Tbe wock I like bat. 




Tescber. 


lioD worth wh 



After a subject has been selected by each pupil, direct him to 
write a composition " about one hundred fifty words in length, 
not over two hundred words." When the composition is fin- 
ished ask the pupil to write his name, grade, and the date upon 
the back. 



The Harvard-Newton Scale Scores. A number of copies 
ot the Harvard-Newton Scale have been placed in the hands 
of teachers, but no report of its use with a large number of 
compositions is available. 

In the investigation of the composition work in the 
schools of Bloomington, Indiana (see footnote 4, on page 
IftT), using this scale, the compositions were limited to 
description. Seven topics were given: (1) Some Person 
in Bloomington; (3) Grandfather; (3) An Old-Fashioned 
House; (4) A Picture; (5) A Public Building in Blooming- 
ton; (6) A Body of Water; (7) A Wreck. Thirty minutes 
were ^ven for actual composition of the first draft, ten 
minutes for preliminary explanation, and thirty minutes 
for writing. Later the pupils were allowed to correct and 



«M EDUCATIONAL TESTS AND MEASUREMENTS 

rewrite for thirty minutes more. Each composition was 
rated by three teachers, and, based on the average of these 
tliree ratings. Huddleston obtained the following median 



"! 



g™* 




.V«d.fln««-( 




61 

57 
61 
68 
78 
67 




ixb 


68 



















Directions for usii^ the Willing Scale. In using the 
Willing Scale these directions should be followed.' 

The teacher should write on the blackboard these topics: ' 
An Exciting Experience 

A Storm 

An Accident 

An Errand at Night 

A Wonderful Story 

An Unexpected Meeting 

In the Woods 

In the Mountains 

On the I<e 

On the Water 

A Runaway 

The teacher should then sa.y to the pupils : " I want you to 
WTite me a story. It is to be a story about some exciting 
experience that you have had, about something very in- 

' Theae directiona are based upon those followed by Willing in securing 
compositionB for his scale. The author is indebted to Mr. Willing for s copy 
of his thesis and for permission to print his scale. 

• It is probably belter to furnish each pupil wilh a printed list of the 



LANGUAGE i03 

teresting that has happened to you. If nothing of the 
sort has ever happened to you, then tell me of an exciting 
experience some one you know has had. You may even 
make up a story of thb kind, if you have to, though I believe 
you will do better, on the whole, with a real one. I am going 
to give you aixiut twenty minutes in which to write. You 
are to write on both sides of the paper, to do all the work 
yourselves, and to ask no questions at all after you begin. 
You may make whatever corrections you wish between the 
lines. There will be no time to rewrite your story. 

" I have written the general subject on the board together 
with some suggestions. You do not have to write on any 
of these topics unless you want to ; they are merely to help 
out in case you cannot think of an exciting experience your- 
self. You may begin now as soon as you wish." 

Allow opportunity for asking questions and make an effort 
to put the children at ease. Allow full twenty minutes for 
the actual writing. At the end of this time say to the 
pupils : — 

"You are to have four or five minutes in which to finish 
your stories, make corrections and count the number of 
words written. Write this number at the end of your story. 
Also write your name and school grade." At the end of 
five minutes collect the papers. 

The method of using the scale in the Denver Survey b 
given by Willing as follows : — 

The method of using the scale which was developed from al- 
most the start was as follows; A composition was read carefully, 
with attention to both rhetorical and, what we have been calling. 
format elements. As the reading progressed, there was a conscious 
eSort to place it on the scale, so that by the end of the reading its 



206 EDUCATIONAL TESTS AND MEASUREMENTS 

fate bad frequently beeo decided. But in many, many cases the dis- 
crepancy between the story value and the form value was such as 
to dictate an adjustment or compromise of some sort. In these 
cases it was the writer's habit to assign the story value first and 
then try to locate the formal worth. A compromise between the 
two had then to be assigned as the mark of the paper. Thus a com- 
position grading very low as a story might by excellence in formal 
matters achieve as high a mark as 60; and, vice versa, a compiosi- 
tion of high story value but low formal quality might be marked as 
far down as 40. But no paper was marked above TO which did not 
have both good story value and technical excellence to commend it; 
nor was a paper marked below 40 which did not lack both of these 
qualities. It is not possible to state exactly the relative emphasis 
that was placed on story value and form value, but the effort was 
made, within the limits just mentioned, to keep the two approxi- 
mately equal. The interpolated marks, S5, 35, etc., were used in 
grading. 

The use of this scale has been shown to be quite consistent 
when one judge rates all of the papers. It still remains to 
show how consistent the ratings will be when several judges 
are employed. 

The Willing Scale Scores. For the Denver Survey the fol- 
lowing median scores were obtained; — 

4th A 31.5 

J5th A 43.4 

6th A 50.9 

7thA 60. ie 

8th A 63.4 

WnjjNQ Scale fob Measurinb Written Composition 

(The values; 90, 80, 70, 60, 50. 40, 30, and 90 given the respective 
samples are arbitrary and merely for practical convenience. 20 
means 15 to 24.9, 30 means 25 to 34.9, etc.) 



The most escJting experience of my life happened when I w 
but five years of age. I was riding my tricycle on the top of o 




LANGUAGE 



207 



high terrace. Beside the curbing below, stood B vegetable wagon 
and a horse. Suddenly I got too near the top of the terrace. 
The front wheel of my tricycle slipped over and down I went, 
lickty-split, under the horse standing by the curbing. I had quite 
a high tricycle and the handle-bars scraped the horse's stomach, 
making him kick and plunge in a very alarming manner. I was 
directly under him during this, but finally rolled over out of his 
way and scrambled up. I looked at my hands! Most of the first 
finger andpart of the thumb of my left hand were missing. Tlie 
horse had stepped on them. I had endured no sensation of pain 
before this, but now my mangled hand began to hurt terribly. I 
was hurried to the hospital and operated on, and now you w«uld 
hardly notice one of my fingers is missing. 1 certainly have good 
cause to congratulate myself on my good fortune in escaping 
withaslittleinjurytomySelf as I did, fori might have been terribly 
mangled in my head or body. 



Number of mistakes ii 
per hundred words — - 0. 



spelling, punctuation, and syntax 



Near our ranch in Fort Logan there was a chicken ranch. One 
day my sister and I went up to the chicken ranch on our horses. 
Coming back there was a road leading from our house to the main 
road and along this road were half rotted stumps. On every one of 
these stumps what do you think we saw. We saw snakes! snakes! 
snakes! I suppose these snakes nere shedding their skins, they 
were of every color, shape, and size. But when sister and I saw these 
snakes we whipped our horses into a gallop and away we went 
just as hard as we could go. When we got to the house we went in 
and mamma couldn't get us out of the house that day. 1 was so 
scared that I beUeve I dreamed about snakes for a month. 

Number of mistakes in spelling, punctuation, and syntax 
per hundred words — 5, 

C — 70 

When I was in Michegan I had an exciting thing happen or 

rather saw it, it was when the big steamship plying between 

Chicago and Muskegon was sunk about 7 o'clock in the evenin;;. 



208 EDUCATIONAL TESTS AND MEASUREMENTS 

It caught OD fire with a load oF cattle and products from the market 
on board, one of the lifeboats carrying some of the few people who 
were on board landed at our pier. The "Whaleback" steamer 
nhich goea between Chicago and Muskegon was two hours later 
in coming than the freighter and was stopped to clear up the 
wreckage, all of the cattle and products and an immense cargo 
of coal were lost, but there were only two people lost, the ship tried 
hard to get to port with her cargoe but, could not reach it. The 
next morning we found planka, and parts of the wreck on the 
beach. Our cottage was at the top of a cliS and it was just one 
hundred feet to the lake from our cottage, we had a beautiful 
view, and the sight of the fire on the horizon was a beautiful sight 
(though it was pitiful). 

Number of mistakes in spelling, punctuation, and syntax 
per hundred words — 8. 

D — 60 
One time when mother, some girl friends and myself were stay- 
ing up in the mountains. An awful storm came up. At the we were 
way up the mountain. The lightning flashed and the thunder 
roared. We were very frighted for the cabin we were staying 
at was at the foot of the mountain. We did n't have our coats 
with us for it was very warm when we started. There were a few 
pine trees near us so we ran under them. They did n't do much 
good for the rain came down in torrents. The rain came down so 
hard that it uprooted one of the trees. Finely it began to slack a 
little. So we thought we would try and go back. About half way 
down the mountain was a little hut. We started and when got about 
half way down it began to rain all the harder. We did n't know what 
to do for this time there was n't any trees to get under. We decided 
to go on for the nearest shelter was the hut. Finely we got there cold 
and wet to the skin. 

Number of mistakes in spelling, punctuation, and syntax 
per hundred words — 11. 



One time mother and father were going to take sister and I for 
a long ride tbankagiving, Wehad to go 60 miles to get there. When 



LANGUAGE 209 

sister and I herd about it we were very glad. It was a very cold 
trip. We four all went in a one- seated automobile. Dady drove 
and mother held me and aister sat on the top the top was down. 
Mother could not hold sister for she was two heavy. When we gol 
there they had a hot fire ready for ua and a goose dinner. We were 
there over night. In the morning it vas hot out. This was on a 
farm. Sister and I got to go horse-baek riding. It was lots of funs. 
They had children. The children were very nice. Our trip home was 
very cold. When we gol home it had snod. 



Number of mistakes in spelling, punctuation, and syntax 
per hundred words — 14. 



My antie hod her barn trown down last week and had all her 
chickens killed from the storm. Whilch happened at twelve o'clock 
at night. She had SO chickens and one horse the horse was saved 
he ran over to our house and claped on the dor whit his feet. 
When he saw him my father took him in the bam where he stepped 
the night with our horse. When our antie told us about the ac- 
cident we were very sorry the next night all my anties things 
were frozen. The storm blew terrible the neit morning and I 
coidd not go to school so I had to stay home the whole week. 



Number of mistakes ii 
per hundred words — 17. 



spelling, punctuation, and syntax 



The other day when I was rideing on our horse the eogion was 
comeing and he got frightened so he through me down and I broke 
my hand. 

And the next thing I done was I went to the doctor and he put 
some bandage on it and he told me to come the next day so 1 came 
the nest day and he toke the bandage off and he look at it and 
then it was better. 



Number of mistakes ii 
per hundred words - 



spelling, punctuation, and syntax 



aiO EDUCATIONAL TESTS AND MEASUREMENTS 



Deron the summer I got kicked and spraio my arm. And I was 
in bed of wheeks And it happing up to Washtion Park I was goinij 
to catch some fish. And I wad so happy when I got the banged of I 
will nevery try that stunt againg. 

Number of mistakes in spelling, punctuation, and syntax 
per hundred words — 30. 

III. The Measurement of Language Abilttt bt 

CoMPLE'noN Tests 

The Trabue Completion-Test Language Scales. Trabue 
has devised a series of Completion-Test Language Scales 
for the general measiirement of language ability. Of these 
scales he says; — 

No attempt has been made to define "language" in any strict 
sense, and it is eutirely possible that some persons may be able 
to apeak the English language and f>erhaps to write it fairly 
welt without being able to make a very high score on these 
scales. It may also happen that some individuals will be found 
who score well on these language scales and are yet unable to quote 
a single rule of English grammar. On the whole, however, it will 
be found that ability to complete these sentences successfully is 
very closely related to what is usually called "language ability." ' 

Each scale consists of sentences from which one or more 
words have been omitted. The position of tlie omitted 
words is indicated by a blank. The pupil being tested is to 
WTite in the missing words. The relative difficulty of the 
sentences has been carefully determined and they have 
been arranged in order of difficulty. Directions for giving the 
tests, scoring the papers and tabulating the results are given 

• Trabue, M. R. CompUfion-Test Langnaf? Scales, p. 1. Teachers 
College ContributioQ3 to Education, No. 77. IBie. 



^ 



^^PI^^F LANGUAGE an 

by Trabue on page 20 of hia monograph. Scale B is repro- 
duced here to iUustrale this type of test. 

Trabue Complbtion-Test Language Scale B 


5m- 


.^ 




1 
« 
8 
82 

S3 

ai 

35 

SB 
48 

54 


.96 
1.98 
2.94 
l.M 

SAO 

6.50 
7.40 

8.4« 
9.50 

10.76 


We like good boys girls. 

The is barking at the cat. 


Time often more valuable 

money. 


sick. 

She if she will. 

Brothers and sisters always 


quarrel. 
weather usually a good 

effect one's spirits. 

It is very annoying to 

most time imaginable. 

To friends is always the 




Trabue teat standards. These scales are so recent that no . 

ha^ been reported. If investigation shows, as Trabue claims, 1 
that they measure an ability that is " very closely related to ^ 
what is usually called ' language ability,' " they will furnish 
a very convenient means of measurement. They require 
only a few minutes of the pupils' time and the definite in- 
structions for scoring insure reliability. However, at best 
they can yield only general measures of " language ability," 
and some other means must be provided for diagnosis. 



I 



Hi EDUCATIONAL TESTS AND MEASUREMENTS 

Trabue gives the following tentative standajda for scalea 
B, C, D, and E. These scales were constructed approxi- 
mately equal in difficulty: — 

Grade Median 

U 3.0 

m 6.0 

IV 8.0 

V 9.6 

VI u.o 

vn. 14. s 

vm is.s 

IX u.« 

X Ifi.S 

XI 16.8 

xn ie.2 



rV, The Measobement of Ability in English Gbammab 
Types of ability. Two types of ability in English grammar 
may be recognized; first, the pupil's knowledge of language 
forms and the rules governing their use, and second, the 
pupil's ability to use language forms correctly. Since the 
fimction of grammatical knowledge is expected to produce 
approved language forms, it is the second type of ability 
with which the school is primarily concerned as an outcome 
of instruction in grammar. 

Starch's Grammatical Scales. Starch ' has devised three 
scales (A, B, and C) to measure a pupil's ability to use 
correctly certain language forms. His Grammatical Scale A 
consists of a series of exercises such as the following: — 

' Staroh, Damd. " The Meaauretnent of Achievement in English Gram- 
mar"; in Journal of Educational Psychology, vol. 6, pp. 615-86. Alio in 
his Educational iteaauremenii, pp.lOS-OS. 



1 



LANGUAGE 



1. A fireman seldom rises above (an engineer; the position of 
an engineer). 

2. The diffeRnce between summer and winter (is that; is) 
summer is warm and winter is cold. 

3. He is happier than (me; I). 

4. They are {allowed; not allowed) to go only on Saturday. 

The pupil is given a printed copy of the scale and is di- 
rected thus: " Each of the following sentences gives in paren- 
thesis two ways in which it may be stated. Cros-s out the 
one you think is incorrect or bad. If you think both are 
incorrect, cross both out. If you think both are correct, 
underline both." Pupils are given as much time as they 
need. 

The sentences of the exercises have been chosen so that 
the difference in difficulty between any two successive steps 
of the scale is equal to the difference between any other two 
successive steps. A pupil's score is the highest step of 
the scale of which he does correctly three out of the four 
sentences. If a pupil fails on a given step, say the seventh, 
but does the ninth correctly, hi.s score is 8. Thus, a pupil 
receives credit for each exereise of which he does correctly 
three out of the four sentences, but the sentences have been 
so arranged that only in a few cases will a pupil be able to do 
an exercise after he has missed the preceding. 

As tentative standards of attainment Starch gives the 
following scores for the use of these scales: — 

Grade Vll VIII IX X XI XII Freshmen 

Score 8.0 8.3 8.6 8.9 D.2 9.5 10,3 

The scale includes many different items of grammatical 
form and apparently these are arranged in no systematic 



Hi EDUCATION.U, TESTS ANB MEASUREMENTS 

maimer. Therefore, a pupil's score can be only a general 
measure of his ability to use correct language forma. To 
secure detailed information concerning his weaknesses it 
would be necessary to examine his test paper. 

The Punctuation Scale. Starch has also devised a Punc- 
tuation Scale ^ which is similar in form to the Grammatical 
Scale. The exercises consist of sentences to be punctuated. 
The following extracts illustrate the nature of this scale. 
. Step 6 

1. We visited New York the largest city in America. 

2. Everything being ready the guard blew his horn. 

3. There were blue green and red flags. 
i. If you come bring my book. 

Step 12 

1. When thou goeat forth by day my bullet shall whistle past 
thee when thou liest down by night my knife is at thy throaL 

2. Oh come you'd better. 

3. The president bowed then Hughes began to speak. 

A pupil's score is determined in the same way that it was 
in the case of the Grammatical Scale. In the case of both 
scales certain features of the form of language liave been 
isolated for the purpose of measurement. They are given 
in a context and in this respect the scales are similar to the 
sentence spelling test given on page 124. The application of 
these scales is simple, and the scores are reliable. The pupil's 
abiUty to distinguish certain correct forms is measured. 
Tentative standards of attainment are the same as for the 
Grammatical Scale. 

The Grammar Tests. Starch has also devised three 
' Starch, D. Edueationai Meamremenla, pp. 108-10. 



LANGUAGE 

tests for measuring directly a pupil's ability to recognize 
certain language forms.^ In Test 1 the pupil is asked to mark 
the part of speech of each word in a certain printed text. 
Hb score is the number he designates correctly in three 
minutes. Test 2 calls for the designation of the case of the 
nouns in another printed text. Test 3 has to do with the 
tense and mode of verbs. 

The author of these teats points out two limitations, first, 
"failure to cover all phases of grammatical knowledge"; 
and, second, " counting one designation of a part of speech, 
case, tense, or mode as equal to any other." The first limi- 
tation is really not a limitation of any one of the tests, but 
of the group of tests. The second results in considering un- 
equal units equal which introduces a source of error. 

Buckingh&in's Test. In making the survey of the Gary 
and the Prevocational Schools of New York City, Bucking- 
ham used a series of questions upon English grammar.' 
These questions were carefully evaluated upon the basis 
of difficulty. They have been rearranged and pubhshed by 
Haggerty." 

V. Meabuhing Accdbacy in Copying 
Copying is a phase of school work which receives little 
exphcit attention. This probably is due to the assumption 
that pupib are able to copy accurately because it appears 
to be such a simple activity. Copying bears a relation to 
written expression and to other school subjects as well, 

' Starch, D. Educofionoi MeatJirem^nta, pp. 110-13. 
' Serenieenth Anmuil Report of the dly Sifperinlendenl of SckooU, Jflli- 
1913. (Department of Education, City of New York.) 
> Bureau of Cooperative Besearcb, Uoiveruty of Mimiesota. 




416 EDUCATIONAL TESTS AND MEASUREMENTS 

Themes are iisually copied before being submitted to the 
teacher. In solving problems in arithmetic the quantities 
are copied from the text. In gathering information from 
references copying occurs. 

The Boston test. The following test of pupils' ability 
to copy printed matter was prepared by a group of Boston ' 
teachers : — 

DlHECTlONB rOH GiVINO AJJD ScOBING THE TeBT 

1, Read to the pupils the directions which are printed at the head 
of the selection they are to copy, but give them no Further help. 
For example, do not specify possible errors which may be made. 

8. Pupils ought not to see the selection until they are ready 
to copy it. Hence it should be placed on the desk face down until 
the signal is given to begin work. 

3. Every error should be checked distinctly. 

4. The errors which were to be noted were as follows ; In spelling, 
capitalization, punctuation, undotted "i'a," uncrossed "t's": in 
omitting words, in adding words, in wrong words used, and in 
misplaced words. 

DlHECTIONS TO PupiLa 
Copy in ink as much of the following selection as you can copy 

accurately in fifteen minutes without hurrying. Accuracy is more 
important than speed : — 

LieoTENANT Omxea 
In IhU stcry a young British lieutenant, in a moment of extreme irri- 
tation, strikes a private soldier. The act a one that calls for dismissal from 
the Queen's service. What is the officer to do? He caniiot send money to 
the soldier — who happens to be the redoubtable Ortheris himself^ nor 
can be apologize to him in private. Neither can he let matters drift 
Ortheris, too, has his own code of pride and honor: he too is a "aervcuit of 
the Queen"; hut how is the insult to be atoned for? The way out of this 
apparently hopeless muddle is a beautifully simple one. after all. The lieu- 
tenant invites Ortheris to go shooting with him, and when they are alone, 
aakahim " to take off his coat." "Th»nkyou,sir!"saysOrtheri8. Thetwo 

' Hchoot Doeument no. g, 1016. (Boston Public Schools, English, Da- 
temiitiiig a Standard in AceuTate CojTfing.) 



LANGUAGE 217 

men Sgiil until Ortheris owns that he is beaten. Then the lieutenant 
biologizes for the original blow, and the officer and private walk back to 
camp devoted friends. That 6ght is the moral salvation of Lieuteoaut 
(Meaa- (Bibs Perry, A Sludg o/Proae Fiction.) 

Kinds of errors made. This test was given to 4494 first- 
year pupils in the Boston high schools in November, 1914, 
and therefore may be considered to measure the abihty of 
pupils completing the eighth grade. The results are both in- 
teresting and significant. The following is quoted from the 
bulletin mentioned above: — 

The errors not«d consisted of nine different kinds, and the niun- 
ber of each kind made in this test by 4494 pupils is shown by the 
following tabulation; — 

Spelling 6,8«0 

Capitalization B44 

Omitted words *,OTT 

Added words ■ 806 

Wrong words used 8W 

Misplaced words lOJS 

Punctuation 5.876 

Undotted " i's*' 8,784 

Uncrossed "t's" 606 

Total «7.377 

Average errors per pupil 6,fi4 

Misspelled words. The test consisted of 170 words, 105 of them 
different words. It is a notable fact that every word was misspelled 
by somebody. It is also interesting that 92.2 percent of the words in 
the test are found in Jones's Concrete Inteatigation qf the Malerial 
t^ English Spelling.^ In spite of the fact that these are words 
commonly used by children in their writing, 11.8 per cent of them 
were misspelled more than 100 times. This does not mean that 
1 1.8 per cent of the children missed these words, because one pupil 
might have missed the same word more than once. 

It is impossible to make any statement in regard to the average 

because many of the words occur in the selection more than once, 

and if misspeQed by the same person each time it occurs it iscounted 

' Published by the University of South Dakota. 



218 EDUCATIONAL TESTS AND MEASUREMENTS 

more than one error. Some children apeiled a word incorrectly in 
one place and correctly in another. One boy apeiled "lieutenant" 
wrongfourout of five times, and apeiled itadifferent way each time. 
Then, not all the children finished the entire selection, and no 
record was kept of the esoct number of words each wrote. How- 
ever, 4,494 pupils taking the teat made 5,829 errors in spelling 
alone, the number of errors for each word varying from 1 to 1,045. 
Undotted "i's" and uncrossed "t's." The errors made by leav- 
ing the "i's" undotted and the "t's" uncroased comprise about one 
tiird of the entire number of errors and are largely important be- 
cause of their value to legibility, as pointed out by Ayrea. In con- 
nection with these errors, it is very noticeable that moat of them 
were confined to comparatively few pupils. If a child shotfed a 
tendency to dot his "i's" and cross his "t'a" in the first few lines, 
the chances were that that individual would have but few errors. 
On the other hand, if the child made many errors in the first part 
of the paper, there were many throughout the copying. One boy 
went through tlie entire paper without dotting an "i." Many 
others dotted only a small part of them. 

This same test was given in Kansas City, Missouri, to 
the pupils in the seventh grade and in the first year of the 
high school. (Kansas City has onlyseven grades below the 
high school.) The average errors per pupil was 8.04 in the 
seventh grade, and 6.83 in the first year of high school. 

VI. Educational SiGNiFrc.\NCE of the Use of these 
Scales and Tests 

Finding specific langu^e weaknesses. The composition 
scales and the completion-test language scales fulfill the 
same function in the field of language as the handwriting 
scales of Ayres and Thorndike do in that field. They are 
instruments for general measurement. By means of them a 
teacher can obtain a measure of the language abiUties of her 
pupils in terms of fixed units which she may compare with 
established standards or with similar measures of other 



LANGUAGE 219 

groups of pupils. They also iDdicate those pupils who are 
below standard and who for this reason need instruction. 
However, before this instruction can be intelligently given 
the language ability of these pupils must be diagnosed to 
locate the exact defects. 

The grammar tests and the copying test measure specific 
abilities. They furnish the teacher with detailed information 
about the several members of her class. If a pupil is weak 
in punctuation that fact is revealed. The teacher then knows 
that she must instruct that pupil in punctuation. If a pupil 
makes certain types of errors in copying, he needs certain 
instruction. 

Remedying the situation revealed. When a teacher learns 
the specific language weaknesses of her pupils she is then in 
position to apply more intelligently her stock of methods 
and devices of instruction. In language as in the case of the 
other subjects, the teacher must instruct individual pupils 
who are grouped together rather than groups of pupils. 
Furthermore, each pupil should receive the instruction which 
he needs to correct his language errors. 

If pupils are weak in a language ability, such as punctua- 
tion, the laws of habit-formation apply. After being sure 
that he understands the function of the punctuation marks, 
a pupil miist have practice in punctuating his own writing. 
This probably is not sufficient. Exercises for practice can 
be constructed by taking appropriate material and repro- 
ducing it without the punctuation marks. 

Until a teacher recognizes definite and specific ends to 
be attained there is certain to be a large degree of dissipa- 
tion of her efforts. Perhaps one reason why language in- 



«aO EDUCATIONAL TESTS AND MEASUREMENTS 

, structioo so often does not produce satisfactory results is 
that it is not directed toward the engendering of definite 
abilities. That our present standards of language are cha- 
otic is indicated in the report of a recent investigation.' 

Analyzing language ability. The six compositions com- 
prising the Harvard-Newton Exposition Scale were repro- 
duced without any identifying marks. They were graded 
on the scale of 100 per cent by twenty-four eighth-grade 
teachers who were asked to follow certain typewritten 
directions. The six compositions were then "completely 
corrected ao far as mechanical or measurable errors were 
concerned." The corrected compositions were graded by 
the same teachers according to the same directions. 

If the " mechanical errors " of the compositions were sig- 
nificant factors in determining the first set of marks, the 
second set of marks should be conspicuously higher. How- 
ever, this was not the case. For two of the compositions the 
average " grade " was less after the " mechanical errors" had 
been corrected. The individual marks show that some teach- 
ers consider form important, and that others tend to disre- 
gard it in marking a composition. 

In teaching spelling teachers have kept a record of pupils' 
errors and have emphasized these words in their teach- 
ing. In our consideration of spelling it was urged that 
teachers first ascertain what words their pupils were unable 
to spell correctly. This plan may be adapted to the teaching 
of other aspects of language. The teacher should nscertain 
the pupils' grammatical errors, and then equip them with 



LANGUAGE 821 

the rules of grammar which are needed to correct them. 
This has been done on a large scale m St. Louis and Kansas 
City, Missouri.^ 

Perhaps the scales and tests described in this chapter will 
have fulfilled their most important fmiction if they cause 
teachers to analyze and define "language ability" in more 
specific terms. It is beheved that their use will tend to pro- 
duce this result. Analysis of " language abiUty" and specific 
definition of the elements are greatly needed. Upon the 
accompfishment of these two things depends the construc- 
tion of more valuable measuring instruments in the language 
field and the scientific determination of methods and devices 
of instruction. 

QUESTIONS AND TOPICS FOR INVESTIGATION 
1. How does the problem of measurement in language differ from the 

problem o( roeasurement in arithmetic? 
a. What makes the problem of measurement in language difficult? 

3. How does the Hillegas Scale differ from the Willing ScaleP The 
compositions which compose the Willing Scale were written under 
defined conditions and on similar topics. Does this make it a su- 
perior scale? 

4. Which of the composition scales described in this chapter will be the 
most helpful to the teacher? Why? 

5. Give the copying test to your pupils following Ihe directions carefully. 
Do the results agree with your estimate of tie ability ot your pupils 
to copy? 

G. Keep accurate liata of the language errors of your pupila. What are 
the rules which are necessary to correct these errors? Are they the 
rules upon which you are placing the most emphasis in your teach- 
ing? 

7, Do you have definite objective standards of attainment in English 
composition? Can you use the tests described in this chapter to 
establish such standards? 

8. Do you think pupils would be helped by having definite objective 
standards of attainment established for them? 

' See report by W. W. Charters in Ihe Sixleentk Yearbook of Ike NiUional 
Society for Ike Study of Ediiailion, part i. 



Hi EDUCATIONAL TESTS AND MEASUREMENTS 



BIBLIOGRAPHY 

Only the most important references are given here. AdditionuJ references 
wiU be fuUDd in the footnotes in the chapter. 

I. Grammar 

1. StaTch'i Oraminatical SraU A. Copies may be obtained from Daniel 
StAlch, University of Wisconsin, Madison, Wisconsin. 

£. Starch'i Piinetjiation Scale A. Copies may be obtained from the above 
address. 

5. Slarnh'i Englith Oramntar Tent 1, 2, and S. Copies may be obtained 
from the aliove address. 

Rbfehbncb; Starch, Daniel. "The Mea.furcment of Achievement 
in English Grammar"; in Journal of Educational Piychology, vol. 
6. pp. 015-23. See also Starch, Daniel, Educational Meamiremenit. 
i. Haggerly'a English Grammar Tesis. An arrangement of Buckingham's 
English Grammar Tests. Copies may be purchased from the Bureau 
of Cooperative Research, University of Minnesota, Minneapolis, 

6. Buckingham' i Engtiih Orammar Tests. Used in the Survey of the 
Gary and PcevocationaJ Schools of New York City. 

Reference: Secmlfenih Annual Report a} the City Superiniendent 
of Schools, New Yoris City, ]'J14-15. 

II. English Composition 

I. Harrard-Newton Composition Scale, devised by F. W. Ballon. Copies 
of the scale may be secured from the Harvard University Press, 
Cambridge, Massachusetts. 

References: Hudclson, Earl. "Some Achievements in the Es- 
tablishment of a Standard for the Measurement of English Compo- 
sition in the Bloomington, Indiana, Schools"; in English Jotimal, 
vol. S, pp. 500-97. (November, 1916.) 

Kayfetz, Isidore. "A Critical Study of the Harvard-Newton 
Composition Scales"; in PedagOfieal Seminary, vol. 23, pp. 325-47. 
(September, 1918.) 
S. HUlegas's Compoailion Scale. Copies may be obtained from the 
Bureau of PublicaUons, Teachers College, Columbia University, 
New York City. 

Ref'EBBNCib: "Hillegas Scale for Measurement of English Compo- 
sition"; in Teachers College Record, September, 1918. 

Johnson, P. W. "The HillcKas-Thorndike Scale for Measuring 
the Quality in English Composition by Y'oung People"; in School 
Review, vol. 81, pp. 39-49. 



■r 

f Ka: 

F Scale' 

I The 



LANGUAGE 



223 



Kayfetz, Isidore. "A Critical Study of the RiQegas Composition 

Scale"; in Pedagogical Seminary, vol. 81, pp. 559-77. 

Tiiorndike, E. L. "Notea on the Significance and Use of the 

HillegBJi Scale for Measuring the Quality of English Composition"; 

in English Journal, vol. i, p. 551. 
3. Thoradike Exteation of Ike HilUgas Seaie for the Meaaaremcnt of 

Quality in EnglUh Compofhion by Young People. No directions for 

use. Cities may be obtained froni Bureau of Publications. Teacbers 

College, Cojumbitt University, New York City. 
i. A Scale for Mtanuring Ike Genetai Merit of English CompoitUion in the 

Sixth Grade, by F. S. Breed and F, W. F-roOic. See EUmenlary Sclumi 

Joumal, vol. 17, pp. 307-85. 

5. a. H. WiUing's Compotiiion Scale. 

REFEBBNCEa: Report of the School Surtq/ of School Diitrict Ntanber J 
One in Ike City and County of Dertver, part ii, p. 69. 

Willing, M. H. Meaauremenl of WriUen English Composition in the ^ 
Public Elementary SchooU of Denver, Colorado. (Master's Thesia, 
Univetsity of Chicago. Unpublished.) 

6. The Na»3au County Siipplemenl to the HUlegas Scale, by M. R, Trabue. 
Copies may be obtained from tbe Bureau of Publieations, Teuchers 
College, Columbia Univeraity, New York City. 

Rsfbrencb; Trabue, M. R. "Supplementing the HiUegaa Scale"; 
in Teachers College Record, vol. IS, p. 51. (January, 1917.) 

7. Traiiut't Compldion-Test Language ScaUa. Copies may be obtained 
from the Bureau of Publications, Teachers College, Columbia Univer- 
sity, New York City. 

Refehence: Trabue, M. R. Completion-Teat Language Scales. 
(Teachers CoUege CoDtributiona to Education, no. 77, 191S.) 



CHAPTER Vn 

mCH^CHOOL SUBJECTS 
The preceding five chapters have dealt with the "tool" 
subjects in the field of elementary education. In each case 
the outcomes of instruction were specific habits. In tlie 
hi^-school field other outcomes of instruction predom- 
inate. These outcome.s of instruction are less definite and 
tangible, and consequently the problem of measurement 
involves greater difficulties. This condition, coupled with 
certain others, probably accounts for the scarcity of tests 
in the high-school field. In this chapter a brirf description 
is given of the tests which have been announced. In only a 
few cases have the tests been used sufficiently to give an 
indication of their effectiveness. 



I. Algebra 
The problem of measurement. As in treating other sub- 
jects, the first step in considering the problem of measure- 
ment in the case of algebra is to determine the fundamental 
outcomes to be realized in the teaching of the subject. 
Among the specific outcomes ' are the comprehension and 
manipulation of symbols in performing the operations ot 
elementary algebra. These operations are used as tools in 
the solution of equations and (^ "practical" problems. 
The manipulation of symbols must be automatic tor the 

' For a complete statement of the outcomes of instnirtion m elementaiy 
algebra see Ku^, H. O. "Tbe Eiperimentsl DetenuinatioD of Standards 
in Fbst-Year Algebra'"; in Sdiooi RerUr. W. 45. pp. 37-68; i(Kr-213, 



HIGH-SCHOOL SUBJECTS 33.J 

same reason that the most commonly used words must be 
spelled automatically or that the operations of arithmetic 
must be performed without focusing the attention upon 
the activity. As in the case of arithmetic each type of ex- 
ample requires a specific abiUty. Hence it is necessary 
to determine the fundamental types of examples in ele- 
mentary algebra. 

The fundamental operations of algebra. The plan of our 
texts suggests that algebra is simply generalized arith- 
metic, that is, that it consists primarily of performing the 
operations of addition, subtraction, multiplication, and di- 
vision with algebraic symbols instead of with the Hindu 
characters used in arithmetic. Thus the fundamental opera- 
tions of algebra appear to be addition, subtraction, multi- 
plication, and division. This is not entirely true. The equa- 
tion is the principal tool or instrument of algebra in dealing 
with "practical" problems. Practically all of elementary 
algebra is grouped about the equation. In fact algebra 
has been defined as the science of the equation. From this 
point of view the fundamental operations of algebra are 
those required in the solution of equations. 

In elementary algebra a very large per cent of the 
" practical " problems require only simple equations. Many 
of the simple equations will contain fractions with numeri- 
cal denominators, but very few of the problems will result in 
fractional equations having the unknown quantity in the 
denominator.^ Therefore, from the standpoint of usefulness 

' See Monroe, Waller S. "An Experiment in the OrBanizatioo and 
Tencbing of FiaMt-Year Algebra"; in School Science and MaUiematics, vol. 

12, pp. ees-si. 



826 EDUCATIONAL TESTS ANB MEASUREMENTS ■, 

the first group of fundamental operations of elementary 
algebra are those which occur in the solution of simple 
equations containing fractions with numerical denominators. 
The operations which are necessary for solving quadratic 
equations and simultaneous equations would form other 
groups of fundamental operations. To learn the group of 
fundamental operations which are required to solve simple 
equations consider the solution of this equation: 



Clearing equation of fractions 

— 181 + 3 + 96i + 198 = 6(te + 80 
Transposing terms 

— 18^ + 96a: - 60a: - 80 - 8 - IW 
Collecting Terms 

]aE= - 115 

Finding value of a: x= — 

The operations involved in the solution of this equation 
are: (1) clearing equation of fractions, (2) transposing terms. 
(3) collecting terms, and (4) finding value of x. Clearing 
an equation of this type of fractions involves the multipli- 
cation of a binominal by an integer, and under certain con- 
ditions the multiplication of a binomial by a binomial. Col- 
lectmg terms is a very simple type of addition and subtrac- 
tion. Finding the value of ^ is a special form of division. 
In the solution of some non-fractional equations the multi- 
plication of one binomial by another will occur. 

In a similar manner the operations which occur in the 
solution of quadratic equations and simultaneous equations 
may be determined. The operations required in the solu- 



fflGH-SCHOOL SUBJECTS S27 

tion of these two types of equations, together with those 
required for the simple equation, constitute the fundamental 
operations of elementary algebra. There will be no need 
for factoring until quadratic equations are taken up, and 
the operation is not necessary then. Exponents beyond the 
square are rarely used. 

At least one attempt ' has been made to determine the 
fundamental operations of algebra by analyzing the content 
of currently used textbooks. The fundamental operations 
determined by this method are given as follows; ■ — 

1. Removal of pareuthes 
i. Combiiiing terms 

3. Subtraction 

4. EvaluatioD 

5. Special producte 

6. Factoring 

7. Expoucnts 



8. Clearing of fractions and 

fractional equations 

9. Quadratic equations 

10. Graphing of equations 

11. Solution of "practical" for- 

13. Simultaneous equations 



This determination is subject to the same limitations as 
must be placed upon all statements of the consensus of 
present practice. It involves the a priori acceptance of the 
traditional subject-matter of elementary algebra as a satis- 
factory basis. 

In order that the results of using a test may be significant 
to the teacher, it is necessary that it measure the ability to 
do something which has a fundamental importance. For 
example, a test which measures the ability of pupils to per- 
form long division in algebra furnishes the teacher with in- 
formation of little importance, because long division is not 

' Eugg. H. O., and Clark, J. R. "Standardized Tests and the ImpMve- 
meot of Teaching in First-Year Algebra"; in School Recieic, vol. 25, pp. 
1S7-^; 190-S15. 



«28 EDUCATIONAL TESTS :\KD MEASUREMENTS 

used in dealing with problems. In fact an examination of 
an algebra text will reveal that the pupil has practically no 
opportunity to use this operation after he leaves the topic 
of long division. 

Standard Research Tests in Algebra. Upon the basis of 
the analysis of the solution of simple equations tlie Stand- 
ard Research Testa in Algebra ' were constructed. These 
tests consist of a series of six tests. Each of the first five 
tests is designed to measure the ability to do one of the 
operations occurring in the solution of simple equations. 
The tests are; — 

Teat I. ±a (± &f±c), o, b. and c, being not greater than 9 
and not all positive- 
Test II. Clearing equations of fractions. 
Test III. Solving for x, a special case of division. 
Test IV. Transposition. 

Teat V. Collecting terms, a special case of addition and sub- 
traction. 
Test VT. Simple equations to be solved. 

In giving the tests each pupil is provided with a printed 
copy of the exercises to be done. A definite time is allowed 
for each test. The ability of a pupil is measured by the num- 
ber of exercises he does in a given time, and by the accuracy 
of his work. 

Other algebra tests. A number of other tests have been 
devised to measure algebraical abilities. 

Thomdike has devised a test which consists of a series of 
exercises arranged in order of increasing difficulty. The rela- 
tive difficulty was determined on the basis of the judg- 

> Monroe. Wnlter S. "A Test oF the Attainment of First- Year High- 
School Students in Algebra"; in School Review, vol. 83. pp. 16&-71. 



^ 



r ment 

I follow 

If. 



mCH-SCHOOL SUBJECTS 



ments of two hundred teacters of algebra. The scale is as 
follows : ' 

4 and 6 = 2, what does a + b equal. Answer 

4 and 6=0, what does a+ b equal? Answer 

If K + 3o — 5a, what does x equal? Answer 

Fiad tJie average mitlnight temperature for the week in which 
the daily midnight temperatures were 15, 3, 0, —7, —9, 6, and 17 
degrees. Answer 

If = —, what does j equal? Answer 



- =0, what does x equal? Answer . 



A man has a hours to spend riding with a friend. How far can 
they ride together, going out at the rate of 6 miles an hour and just 
covering the return trip at the rate of c miles an hour. Answer 

How much water must be added to a pint of "alcohol. 95 per 
cent pure" to make a solution of alcohol, "40 per cent pure"? 
Answer 

Given that 2j: - 3 ia less than j + 5. and that 11 + 2* is less 
than 3x+ 5, to find the limits (i.e., the values) between which 

Two series of algebra tests, consisting of 12 tests each 
have been devised. One has been devised by Rugg and 
Clark ' and the other by Ckilds. 

In this second series the Standard Research Tests described 
above were incorporated. It is interesting to note that the 
two series of tests, each consisting of the same number of 
separate tests, agree on only five operations: (1) multipli- 

' So far as the writer knows copies of the teat are not available for dis- 
tribution. See Riigg. H. 0. "The Experimental DrtCTniination of Stand- 
ards in First-Year Algebra"; in School ReHew, vol. 86, p. 13. 

' hoc. ejl., pp. 37-66. In the most recent publication of these teats the 
number of tests in this series bu been increased to sixteen. 



r 



830 EDUCATIONAL TESTS AND MEASUREMENTS - 

cation of a binomial by an integer, (2) factoring, (3) solu- 
tion of simple equations, (4) stating problems, (5) solving 
simultaneous equations. The series devised by Rugg and 
Clark include tests for the following additional operations: 
(1) product of two binomials, (2) evaluation of algebraic 
expressions by substitutions, (3) involution and operations 
based on other laws of expancnts, (4) quadratic equations, 
(5) graphical solution of simultaneous equations, (6) evolu- 
tion, (7) solving formula for certain quantities. The series 
devised by Childs include tests for the following operation 
in addition to those which are common to the two series: 
(1) addition and subtraction, (2) finding the value of x in 
such equations as 5x = — 31, (3) long division, (4) trans- 
' position, (5) collecting terms, (6) clearing equations of frac- 
tions. (Two testa are devoted to the stating of problems.) 

Conclusions from the tests. The fact that there are so few 
points of agreement among those who have devised tests 
in the field of algebra is significant. It indicates, in the first 
place, that the fundamental operations of algebra are not 
simply addition, subtraction, multiplication, and division, 
as in arithmetic. In the second place it indicates that the 
textbook writers and teachers of algebra have not yet de- 
termined the operations which are fundamental. In the 
analysis of the subject-matter of algebra given above the 
writer has indicated his belief that the fundamental opeia- 
tions of algebra are those required in the solution of equa- 
tions. The acceptance of this premise furnishes a definite 
basis for constructing a series of tests. 

Standards. The Standard Research Tests in Algebra 
have been given in a number of high schools. The following 



HIGH-SCHOOL SUBJECTS 



median scores, based upon returns from twenty-one cities, 
may be taken as tentative standards: — 



Tat 


/ 


11 


in 


IV 


y 


VI 


Number o( Pupih 

Speed, number of examples 

attempted 

Accuracy, per cent of ex- 


8077 

li.e 

S6 


1993 

S.4 
41 


2107 
11. li 

,00- 


2127 

10.2 


219S 
11.2 

77 


1992 
B.3 







Tentative standards for certain tests oE the series devised 
by Rugg and Clark are given in the School Review, vol. IS, 
pp. 122-23. These standards are also printed on the class 
record sheet which is furnished with the tests. Childs has 
recently published a report of the use of his series of 
algebra tests in fifteen cities.' 

Meeting the teaching situation revealed by algebra 
tests. The mental processes of algebra are similar to those 
of arithmetic in many respects. In both subjects the opera- 
tions must be performed automatically in order to free the 
attention for the doing of other things. The abilities are 
specific. In arithmetic it was shown that addition involved 
not simply one ability but several abilities. (See page 18.) 
Each type of example required a specific ability. A scientific 
analysis of algebraical abilities has not been made, but it is 
probable that in algebra each type of example requires a 
specific ability if it is done automatically. The engendering 
of arithmetical abilities is based upon the laws of habit 
formation. These laws also apply to the teaching of the 

■ Childs. Hubert G. "The MeBsurement of Achievement in AlBebra," 
in the Third Conference on Educational Meajturementa. (BvUetin, Ex- 
tension Division, Indituu University, vol. 2, no. 6, pp. 171-83.) 



232 EDUCATIONAL TESTS AND MEASUREMENTS , 

operations of algebra. Individual differences complicate 
the teaching of arithmetic. They have been shown to be 
equally conspicuoiis in algebra. 

Class instruction posseaaes the same weaknesses in alge- 
bra as it does in arithmetic. The writer has observed the 
plan of giving drill, described on page 55, in algebra classes 
as welt as in arithmetic. The result was the same in both. 
Each pupil needs drill upon the types of examples he does 
not do well. Practice tests for algebra ' can be devised upon 
the same principles as those for arithmetic. The sugg3stions 
given on page 59, for adapting the instruction to the needs 
of the pupils, may be applied to instruction in algebra as well. 

Rugg and Clark state that " It has been shown that suc- 
cess in teaching algebra depends primarily on the teacher's 
knowledge of the typical difficulties which the pupils will 
meet in learning algebra." In the reports of the use of al- 
gebra tests the errors made by pupils have been studied. 
In all cases the number of errors has been large. Table XXV 
gives the types of errors which were made by two hundred 
and seventy-five first-year pupils on the Standard Research 
Tests in Algebra, The tests were given in March, 1914. 
Rugg and Clark have used a more elaborate classification 
of errors, but their results indicate the same condition. 

The conditions revealed clearly indicate that the pupils 
have not been given sufficient satisfactory drill to make au- 
tomatic the performing of the simpler operations of algebra. 
Before this can be accomplished, teachers must determine 
what the important or fundamental operations of algebra 

' See Rugg, H. O., and Clark, J. R. " Standard! ni^l Test^i Bud the Im- 
provement of Teaehing in First- Year Algebra"; ia School Reriew, vol. 25, , 
pp. 196-213. (Much, 1917.) 



fflGH-SCHOOL SUBJECTS 833 

are. When this has been accomplished they must further 
detennine what types of eserciaes occur in each operation. 
For example, assuming that performing the indicated opera- 
tion of a {bx + c) ia fundamental, it is obvious that these 
types of exercises occur, a (63: + c),—a(bx +c), a { — bx + c), 
-a{-bx + c), aibx-c), -aibx~c}, -a{~bx-c), 
a (— 6x — c). In the studies referred to it was found that 
each required its own specific ability. This being the case 
the teacher must provide satisfactory drill upon each type. 
A series of scientifically constructed practice exercises fur- 
nish a means for doing this. 



Table XXV. Classification op Ebborb made b- 
HuxDBED AND SEVENTY-nvB Fibst-Yeab Pupils ( 
Standabd Reseasch Ti»t8 (Aloebka) 



Two 



Mistake in sign 

MisUbke in the commoii denoDiin^ 

Mistakes in arithmetic 

Mistakes in copying 

One term of binomial not multi- 

A term neglected 

X omitted 

Inconqilete as — x — B 



II III IV 



II. Geometbt 

A geometry test of seventy questions, arranged in twenty 

groups, has been devised by Stockard and Bell.' "These 

' Stockard. L. V., and Bell. J. C. "A PreSiminary Study of tlie Mes»' 
mentofAhilitiea in Geometry"; in Jovnud of Eduealional Piiyc'^ 
7, pp. 587-80. 






834 EDUCATIONjU, TESTS AND MEASUREMENTS 

groups involve drawing figures, naming figures, indicating 
order of development in demonstration, completing state- 
ments, stating of converse^ definitions, regular polygons, 
parts of a demonstration, angular relations, area of trape- 
zoid, angles in polygons, angles in circles, congruency of 
triangles, similarity of triangles, loci, auxiliary lines, simple 
constructions, ratio and proportion, algebraic expression of 
geometrical relations, and equivalent construction." The 
questions are asked in such a way that many pupils are able 
to complete the list in forty minutes. On the basis of the 
results of giving the tests to 372 students the difficulty of 
the questions has been expressed in terms of a common 
unit. 

m. Foreign Languages 
Foreign languages involve some of the same mental pro- 
cesses as occur in the field of the English language. A pupil 
must associate the correct meaning with words and groups 
of words. In using a foreign language to express ideas, the 
rules of grammar which govern the form must be followed. 
Hence the problem of measurement is similar in several 
respects. 

Starch's foreign language tests. Starch has devised both 

vocabulary and reading tests for Latin, German, and 

*''Nch.' The same plan has been followed for each language. 

^S. ^Q'y **^ Latin test will be described here. The words of the 

I - Latm ■v(>-,j^jj^jy (.pgt ^epe selected from Harper's Latin 

I 't-m- 'fakiogthefirst word on every twentieth page. 



■f one hundred words. In addition to the 
Starcb, D, _^ 

^'icatianal Measurements, pp. 171-87. 





HIGH-SCHOOL SUBJECTS 23j 

Latin words the English equivalents are given on the test 
paper. Both lists are arranged alphabetically. The test 
consists of associating with each Latin word its English 
equivalent. The Latin reading test consists of sentences 
selected from first-year Latin books, Csesar. Cicero, and 
Virgil, and the reading tests in German and French are 
composed of simple sentences. The sentences in each of the 
tests are arranged in order of increasing difficulty. 

The vocabulary test measures the extent of the pupil's 
"vocabulary." However, it maybe that this " vo<;abulary " 
is not the same as the vocabulary which he is able to use in 
translating Latin into English. Both the vocabulaiy tests 
and the reading tests are simple to use. 

The Hanus Latin tests. Hanus has also devised a group 
of Latin tests ' which consists of four tests for vocabulary, 
a translation test, and a grammar test. All of these tests 
are based on Ciesar and Cicero. No words appear in the 
vocabulary tests " which occur less than one hundred times 
in Caesar and Cicero." The translation test " contains only 
constructions which are found at least five hundred times in 
Csesar and Cicero." The grararaar test is based on the sen- 
tences to be translated. The vocabulary test differs from 
the one devised by Starch in that the pupil must write the 
English equivalent. 

Henmon's Latin tests. V. A. C. Henraon, of the Ur**'"- 
sity of Wisconsin, has devised a series of Standa^ '" 

Latin which consist of three tests. First, anj^ ^ .7^*^^' 
word vocabulary test — fifty in Enj,_ 

' Hanua, Paul. "Meaauring Progress ii 
Aencw, vol. 24. pp. S13-51. ' 



288 EDUCATIONAL TESTS AND MEASUREMENTS 

contaming the words that are common to four widely used 
first-year books. Second, a Standard Vocabulary Test of 239 
words representing all the words common to thirteen first- 
year books and to Csesar, Cicero, and Virgil. Third, the Latin 
sentence test consists of thirty sentences constructed by us- 
ing none but the 239 words of the Standard Vocabulary Test. 

A pupil's vocabulary is important in the study of a foreign 
language, as well as in the field of English. Hence in using 
a vocabulary test the teacher is measuring an important 
ability. The test will reveal that some pupils have much 
smaller vocabularies than others. By comparing the class 
score of his pupils with the standard a teacher may know 
whether he is placing sufficient emphasis upon the feature 
of the language which the test measures. A translation test 
gives a general measure of the ability of a pupil to translate. 
It ako furnishes the teacher with a means for comparing 
her class with an objective standard. 

No devices for remedying the shortcomings which tests 
reveal have been worked out. Several of the devices given in 
the treatment of reading and language can easily be adapted 
to the teaching of a foreign language. As in the case of 
other subjects, the tests will fulfill an important function 
if they do nothing more than demonstrate to the teacher 
that he is instructing a group of pupils who differ widely, and 
that his efforts will be most effective when each pupil is 
given the instruction which he needs. 

IV, Physics 
Starch has also devised a series of tests in physics, cover- 
ing mechanics, heat, sound, hght, and magnetism and 



IL 



~\ 



fflGH-SCHOOL SUBJECTS «37 

electricity.' The tests consist of sentences, from which 
words have been omitted. The sentences and the words to 
be omitted have been chosen so that a pupil cannot supply 
the correct words unless he knows certain physical facts 
or principles. The following sentences will illustrate the 
tests: — 

17. The periods of penduluma of equal lengths swinging through 

short arcs are independent of and also independent 

of 

30. The point degrees below degrees 

centigrade is called 

39. The frequency of vibration of a string varies inversely 



53. The critical angle is that angle of incidence which will pro- 



55. Electromotive force is the difference in be- 

The tests consist of seventy-five mutilated sentences of 
this type. The facts, principles, and laws upon which these 
sentences are based were determined by examining five 
widely used textbooks. The one hundred and two facta, 
principles, or laws which were treated by all five of the text- 
books are the ones which the pupils must know to do the 
tests correctly. These facts and principles probably are the 
ones fundamental to elementary physics. If the tests do 
nothing more than define the fundamental facts and princi- 
ples they will fulfill an important function. However, they 
also give the teacher a means for comparing his class with 
other classes. 

• Starch, D. Educationai Meaturemenls, pp. 188-92. 



238 EDUCATIONAL TESTS AND MEASUREMENTS 

V Other Tests which mat be used in the Hiqh ■ 
School 

Certain tests of those described in the preceding chapters 
are intended to be used in the high school, as well as in the 
elementary school. These are Test III, of the Kansas Silent 
Reading Tests; the Thorndike Scale AJpha for Measuring 
the Understanding of Sentences; Starch's English Vocabu- 
lary Test; the composition scales; the copying test; the 
Trabue Completion-Test Language Scales; and Starch's 
Grammatical Tests. For the description of these tests and 
the use of them the reader is referred to the preceding 
chapters. 

In addition to these tests many of the others have been 
applied to high school pupils. For example, the Courtis 
Standard Research Tests in Arithmetic, Series B, have fre- 
quently been given to high school pupils, although many of 
them were not studying arithmetic. However, in applying 
such tests to high-school pupils it should be remembered 
that the tests were not designed for that purpose. Conse- 
quently it may be expected that the tests will not be as satis- 
factory as when used in the way they were intended to be 
used. 



QUESTIONS AND TOPICS FOR INVESTIGATION 

I. How do you accoiinl for the fiLct that Ipns progress haa been made 

in devising teata tor high-school subjects than for elementary school 

lubjecU? 
a. Which of the algebra teats described in llua chapter do you think 

would be most helpful to a teacher? Why? 
S. Compare the Latin vocabulary tests which arc described. Which one 

do you think would be most helpful to a teacher? Why? 





mCH-SCHOOL SUBJECTS SS9 

4. Is a test like llie physics tests superior to an ordinary examination? 

fi. The geometry teat described covers a very wide range of topics. Is 
tills necessary for a test in such subjects as geometry, physics, or 
history? 

6. If you are teoehiog algebra, study the errors which your pupils make. 
Suggest a plan for redudog the number of errors. 

7. Suggest KVem] plans for iDcre&sing the Foreign language vocabulary 
of pupils who have been shown to be below standard. How could you 
determine which of the plans ia the best. 

BIBUOGRAPHY 

I. AliGEBRA 

Only the moat important references are given here. Additional reterencea 
will be Found in the Footnotes of the chapter. 

1. Indiana Algebra Tatt, arranged by H. G. Childs. Copies may be ob- 
tained From the University Book Store. Bloomington. Indiana. 

RsFEiiENCE: Chiids, Hubert G. "The Measurement oF Acliieve- 
ment in Algebra"; in Third Conjennce on Edvcalionai ileamiremffnta. 
' (Bulletin cF the Extension Diviaion, Indiana University, vol. U, no. C, 
pp. 171-83.) 

5. Freiindiuiry Algebra TtsU, devised by C. E. Stroraquist. Copies may 
be obtaioed from the University of Wyommg, Laramie, Wyoming. 

S. Standardired TeMt in Firit Year Algebra, devised by H. O. Rugg 
and i. R. Clark. Copies may be obtained from H. U. Rugg, School 
of Education. Uoiversity of Chicago. 

Repbrenites: Rugg. H. O. "The Experimental Determination of 
Standards in First Year Algebra "; in School Redeic, vol. 24. pp. 37-66. 
(January, laiO.) 

Rugg, H. O., and Clark. J. R., "Standardised Tests and the Im- 
provement oF Teaching in First- Year Algebra"; in School Renew, vol. 
45, pp. 113-32 and 1B8-213. (FebruaTj and March, IfllT.) 
J. Standard tteseareh Tettt in Algebra, devised by Waller S. Monroe. 
Copies may be obtained from the Bureau of Educational Measure- 
ments and Standards. Emporia, Kansas. 

Reoihence: Monroe, Walter S. "A Test of the Altainmenl of 
Firstr-Year High-School Students in Algebra"; in School flew"™, 
March. lalS. 

5. A Scale far TtMing Ability in Aljebra, devised by W. H. Coleman. 
Copies may be obtained from W. H. Colemon, Bertrand, Nebraska. 

6. Tkomdike't Algebra Tat. 
Reference: Rugg. H. O. "The EKperlmeutal Determination of 

Standards in First-Year Algebra"; in School RevieiB, vol. 25, pp. 
37-66. {January, 1916.) 



6. Tkomd 

Refi 

L SUndu 

■ 37-66. 



240 EDUCATIONAL TESTS AND MEASUREMENTS 



U. Gbouetrt 
RErGKBNCE: StockBrd. L. V.. and Bell. J. C. "A Preliminaiy 

Study of the Measurement gf Abilities in Geometry"; in Jovnud of 
Educatvmal P^ehotom/, vol. 7, pp. fi87-80. 

m. FoBEiGN Lanquaoeb 

1. Frtach Voeaindary and ReadtJi^ Tiata, devised by Daniel Starch. 
S. Oerman Voa^nttarj/ and Reading Testa, devised by Daniel Starch. 

5. id/tJi VacabulaTy aiul Reading Teats, devised by Daniel Starch. 

Cofues of all three <if these teats may be Dbtained from DaDiel 
Starch, Umversity of Wisconsin, Madison, Wisconsin. 

Refebsnce: Starch, Daniel. Educafional Meamremrnts, chaps. 
xj, xu, and xm. (The MacmillBn Company.) 
i. Hanut's LaUn Teats. 

Reference: Hbdus, Paul H. "Meaauriug Progress in Learning 
Latin"; in School Review, vol. 24, pp. 8*8-61. (May, 1916.) 

6, Henmon'a Latin TeiU, Copies of the tests may be obtained from 
V. A. C. Heninon, University of Wisconsin, Madison, Wisconsin. 

IV. Physics 

1. Tata for Phjitiet, devised by Daoiiel Starch. Copies may be obtained 

from Daniel Starch, Univer^ty of Wisconsin, Madison, Wisconsin. 

Reference: Starch, Daniel. Educational MtamTemtTils, chap. xtv. 
(The MacmiLan Company.) 



STATISTICAL METHODS 

The teacher, principal or school officer making use of the 
standard tests previously described has but little use for any 
mathematical knowledge of statistical methoda. The httle 
knowledge that is needed relates much more to proper 
methods of tabulating and tke most effective methods of 
graphic representation. The few calculations that need 
to be made are arithmetical, and the statistical terminology 
needed is quite small. This chapter is intended to state 
briefly what will be needed, and to give a few typical illus- 
trations of good methods in tabulating and charting. 

Good arrangement of scores. The significance of a group 
of facts, such as the scores made by a class upon a test, 
may be made more evident by certain methods of arranging 
them, and by determining an index of the central tendency 
of the group. Take, for example, the scores which were made 
by a sixth-grade class of thirty-five pupils when given the 
Kansas Silent Reading Test. When these scores are pre- 
sented in the manner of Table XXVI the array tends to 
confuse. One must scan the entire array to learn that the 
lowest score is 4.2. or that the highest score is 30.1. One 
cannot easily learn that pupil 6B, who made a score of 
14.9, stands eighth from the poorest in the group. If now 
the scores are simply rearranged in order of magnitude, as 
shown in Table XXVII, their significance is much more 
easily grasped. 



' 










1 


242 EDUCATIONAL TESTS AND MEASUREMENTS 1 


Tablk XXVI. 


Showing a Poor Arbanoement of Scores 


Pupa 


Score 


i^a 


Socre 


Pupa 


s«^ 


A 


27.3 


M 


10.0 


Y 


16.0 


B 


19.8 


N 


16 




Z 


10.1 


C 


26.2 





21 




AA 


16.4 


D 


22, fi 


P 


25 




BB 


14.0 


E 


15.4 


Q 


21 




CC 


16.4 


F 


19.3 


R 


16 




DD 


14.1 


G 


28,1 


S 


16 




EE 


4.2 


H 


17.4 


T 


5 




FP 


so.o 


I 


25.1 


U 


30 




GG 


24.1 


J 


15.7 


V 


22 


3 


HH 


26.3 


E 


11,8 


W 


IS 


1 


n 


25.8 


L 


21 .Q 


X 


12 


8 






Table XXVU. 


Showinq th 


s Same Scobes rearranged in a 






Better Order 






P^pii 


s™™ 


p«pa 


Swre 


P^pU 


Scon 


EE 


4.2 


Y 


16,0 


V 


B2.S 


T 


5 


9 


S 


la.i 


D 


22.6 


M 


10 





N 


18.3 


GG 


24.1 


E 




s 


CC 


18.4 


I 


86.1 


X 


12 


8 


H 


17.4 


P 


2S.6 


w 


13 




^F 


18.3 


n 


25.S 


DD 


li 




Z 


19.1 


c 


26.2 


BB 


14 




B 


10.2 


HH 


28.3 


AA 


IB 




FF 


20.0 


A 


27.3 


E 


IS 







21.1 


G 


28.4 


J 


15 




Q 


21.1 


U 


30.1 


R 


15 


9 


L 


21.6 






The median. 


The eighteenth score is the middle one of 


this group. Such a score is 


called the ' 


median 


core," and 


indicates the "central tendency " of the 


group. The general 


standing 


of a group of pupils is expressed by 


he central 


tendency. In case there is 


m even nun 


aber of s 


=ore.s there 


is no middle score. In such a case the 


average 


of the two 


most ce 


itral scores may be taken, although for practical 


. k. 








1 



STATISTICAL METHODS 943 

purposes it will be satisfactory to take the lesser of the two 
middle scores. Thus, if a distribution contains forty-one 
scores, the middle score is the twenty-first score; if it con- 
tains forty scores, the twentieth score may be taken as the 
middle score. ^ The number of the middle score is obtained 
by dividing the total number of scores by two. This quotient 
expressed as the nearest integer is called the " half sum." 
When using a particular test the directions which accompany 
that test should be followed, because the medians used for 
standards have been obtained by following the accompany- 
ing directions. 

Frequency of scores. Usually scores are expressed simply 
in whole numbers. When this is the case, the distribution is 
simply the statement of how many scores there are of each 
magnitude. This is done as in Table XXVIII. The column 
labeled " Score " gives the scores arranged in order of mag- 
nitude. The column headed "Frequency" tells how many 
scores there are of each magnitude, or how frequently each 
score occurs. In this table there are two scores of 8, one score 
of 7, four scores of 6, etc. The total number of scores is 33. 



"The median score is the score on the middle paper in the pile of papers 
arranged according to size of scores. If there are thirty-five papers, the me- 
dian score Is the score on the eiehteenth paper. If there are thirty-sis papers, 
the median score is half way betweeo the score on the eighteenth paper 
BJid the score on the nineteenth paper." 

The directions which accompany the Courtis Standard Rescareh Testa in 
Arithmetic, Series B, read as follows: — 

"If there are thirty-seven children in the class, the nineteenth score in 
order of magnitude would be the median score; for there would be eighteen 
scores larger and eighteen smaller. If there were thirty-six children in the 
class, the eighteenth score would be taken as representing the nearest 
Bpproxinuitioii to the middle measure." 



244 EDUCATIONAL TESTS AND MEASUREMENTS 
Table XXVIIl. Showinq the Dihtribution or Scores 



Total 33 

Here the seventeenth score is the middle one, but it is 
not possible to identify its value directly. It is clearly one of 
the eight scores given as 4, because counting from the lower 
end of the distribution, 1 and 4 are 5, and 6 are 11, and 8 
are 19, which is beyond the middle of the distribution. The 
approximate value of this median, that is, the seventeenth 
score is 4,0, To determine the approximate median it is 
simply necessary to locate the interval of the distribution in 
which the median falls. 

To locate the interval in which the median falls, begin at 
the lower end of the distribution and add together the fre- 
quencies until the addition of the next one will make a sum 
greater than the number of the median score, or half of the 
total of the frequencies. This sum of the frequencies is called 
the partial sum. The median score is in the next interval, 
and the approximate median is the value of that interval. 

Intervals of distribution. In Table XXVII each scor« is 
' Frequency or number of pupils making the score. 



\ 



STATISTICAL METHODS 245 

listed separately. For certain purposes this makes the dis- 
tribution more complex and awkward to work with than if 
the scores were classified in a few groups. A convenient 
choice of groups for this array of scores would be as shown in 
Table XXIX. In the group or interval from 14.0 to 15.9 
there are six scores — 14..1, 14.9, 15.4, 15.4, 15.7, and 15.8. 
When they are grouped together each score loses its identity, 
but it should be remembered that the scores which are 
grouped together are not necessarily equal. In calculating 
the true median of a distribution, it is necessary to assign an 
identity to the middle score. We do this by assuming that 
the exact values of the scores included within an interval are 
uniformly distributed over the interval. 

Table XXIX. Arrangement of Scores nr Interval Groups 

'nitnvU Frtqutnoy 

to SI . 9 
to 29.9 
0loi7.9 
0to26.9 
to 23. 9 
to 81 . 9 
to 19.9 
Oto 17,9 
Oto 15.9 
Oto 13.9 
to 11. 9 
Oto 9.9 
Oto 7.9 
Oto S.9 
Oto 3.9 



6 ,1 



Many of the tests yield scores in terms of the intervals of 
distribution. This is true of the scores from the Courtis 



«46 EDUCATIONAL TESTS AND MEASUREMENTS 

Standard Research Testa, Series B, The pupil's score is so 
many examples attempted and so many right. From this 
it may appear that the scores should not be considered to 
be distributed over the interval, but it must be remembered 
that in marking the test papers no credit was given for ex- 
amples partly completed or for examples partly right. Thus 
all fractions have been dropped. This being true, it is ob- 
vious that the accurate measures of the pupils are really 
probably distributed uniformly over the interval. 

Approximate and true median. To calculate the amount 
to be added to the approximate median to make the true 
median, proceed as followsr (1) Subtract the partial sum of 
the frequencies from the half sura. The partial sum is found 
in determining the approximate median. (2) Divide this 
difference by the number of scores which are included in the 
interval in which the true median falls. Add this quotient to 
the approximate median. If the width of the interval is more 
than one unit, as in Table XXIX, the quotient must be mul- 
tiplied by the number of units the interval contains. In the 
case of Table XXIX the quotient would be multiplied by 2. 
It is well to carry the quotient to two decimal places, but in 
writing the median it should be expressed only to the nearest 
tenth.* 

The approximate median of the distribution given in 
Table XXIX is 18,0. The half sum is 18 and the partial 
sum is 17. The difference is 1, and this, divided by 3, the 
number of scores in the next interval, give a quotient of 
.33. Since the width of the interval is 2, .33 is multiphed 



STATISTICAL METHODS 

by 2. A correction of .66 is to be added to the approximate 
median, 18.0. This gives 18.66, which should be WTitten 
as 18,7, the true median of the distribution. 

~^ The average. Another centra! tendency is the average I 
or arithmetical mean. This is, however, much less used than ' 
the median, and usually is a much less accurate measure to 
use. The average of a group of scores may .be found by 
dividing the sum of the scores by the number of scores. 
However, if the group is a large one, this involves consid- 
erable labor. In such a case a short method may be used,* 
J The mode. Still another central tendency, though not i 
much used, is the mode. By mode we mean the most com- J 
mon score or measure. Thus, in Table XXVIII the mode is J 
4.0 and in Table XXIX it is 14.0. Average and mode, 
though, are but little used, the most commonly used central \ 
tendency measure lieing the median. 

Measures of variability; (I) average deviation. Both the I 
median and the average represent the centra! tendency of 4 
the magnitude of the group of scores. Frequently it is help- 
ful to obtain an index of the variability of the scores, that is, 
of the closeness with which the scores are grouped about the 
median or the average. The amount by which a score differs 
from the central tendency is called the deviation. A score 

, which is greater than the central tendency will have a posi- . 
tive deviation; a score which is less will have a negative de-l 
viation. The sum of the deviations without regard to sign.J 
divided by the total number of scores in the group, gives theJ 
average deviation, usually indicated by the abbreviation, A.D 

' For this method see King, W. I., The EUmtnlt of StaiMical Method, ' 
pp. 131'3T;oiThDmdike,E. L., Mental andSoei<dMea«iTementt,pp.ii-iS. 



M8 EDUCATIONAL TESTS AND MEASUREMENTS 

In calculating the average deviation of a group of scores, 
which are listed aa being within an interval, it should be 
remembered that the scores are to be considered as being 
distributed evenly over the interval. Thus their average 
value is at the middle point of the interval. If the inter- 
val is from 12.00 to 12.99, the mid-point is 12.50, and in 
calculating the deviation each score in the interval from 
12.00 to 12.99 should be treated as having a value of 12.50. 

The method may be illustrated by calculating the aver- 
age deviation of the distribution given in Table XXX. The 

Table XXX. Showing the Calcdiation of the Average 
Dbviatiox 


inltnaii 


/VfjMncSF 


Serialim 


FrequeHcp 


wn. tn 




2 
8 


IZ.3 
10.3 
8.3 

e.s 

4.3 
«.3 
.3 
—1.7 
—3.7 
—5,7 
—7.7 

—18.7 


1 


9^ 

ee 

80 
18 

1! 
10 

6 
4 

B 


0toa9 
Oto*7 
0ta»5 
0t<>23 
OtoSl 
Ota 19 
toll 
OlolS 
OtulS 
Otoll 
Oto 9 
Oto 7 
Oto S 
Oto s 




10 

24 
25 
8 
S 

—8 


3 


















11.4 
15,4 

37. 4 




















SB 


^\- 


186.3 






True median 18.7 

Average deviation — ~ "8.3 1 



STATISTICAL METHODS 



median is 18,7. The average of each group of scores, sub- 
tracted from the median, gives the deviattoos. When there 
is more than one score in an interval the deviation must be 
miiltiplied by the frequency to obtain the total deviation. 
The calculation of the average deviation may be shortened 
by using the approximate median, and then correcting 
the result.* 

(2) Percentiles ; quartiles ; probable error. Another 
measure of the variability of scores may be found by deter- 
mining the points on the scale between which the middle 
50 per cent of the scores are included. The lower one of these 
two points is called the 25 percentile. It b the point below 
which there are 25 per cent of the scores. The upper one is 
called the 75 percentile. One half of the difference between 
these two points is called the quartile range, or simply Q. 
In case the median is used as the central tendency, the 
quartile range is the same bs the median deviatJOD or 
probable error (P.E,). 

The calculation of the 25 percentile and the 75 percentile 
is similar to that of the median, which is sometimes called 
the 50 percentile. The only difference is that instead of the 
half sum, the one fourth sum and the three fourths sum 
are used. 

For the distribution given in Table XXIX, the one fourth 
sum is 9, The approximate value of the 9th score is 14.0, 
The correction is 1.0, making the true value of the 85 per- 
centile 15.0. The three fourths sum is 25. The approximate 
value of the 25th score is 22.0. The correction is 1.0, and the 



^ 



250 EDUCATIONAL TESTS AND MEASLTtEMENTS 

true value, of the 75 percentile is 23.0. The range of the 
middle 50 per cent of the scores is from 15.0 to 23.0, or 8 
units. The value of Q is one half of 8 or 4 units. This means 
that 50 per cent of the scores contained in this distribution 
are contained within an interval of 4 units on either side of 
the median. The quartile range or Q is easy to calculate, 
and for many purposes is the most significant measure of 
variabihty.' 

Correlation. The Courtis Standard Research Tests, Series 
B, measure simultaneously two related factors of a pupil's 
ability to do the operations of arithmetic with integers, 
— speed or number of examples attempted, and the accuracy 
or per cent of examples done correctly. In the measiu^- 
ment of handwTiting scores of speed and quality are obtained. 
In reading the pupil's rate of reading and his comprehension 
are measured. 

When pairs of scores are to be arranged in distributions 
it is convenient to use the phin shown in Table XXXI. 
This table shows the tabulation of the handwriting scores 
for a fourth-grade class. This table is read as follows: 
four pupils wrote at a rate between 20 and 29 letters per 
minute. The handwriting of one of these pupils was judged 
to be quality 20. one 40, one 50, and one 60. The line at the 
bottom marked "total" gives the distribution of the scores 
for quality. The column at the right marked " total " gives 
the distribution of the scores for speed. The tabulation 
shows the relation between speed and quality. The relation- 
ship of these two quantities is not constant. Certain pupils 

' For methods of cslcukting the mean square deviation the reader is 
refened Ui Thomdike, E. L., Meriiat and Social lUeatuTemenUt pp. 47-SS. 



L. 



w 

I possess a 



STATISTICAL METHODS 



I high degree of both speed and accuracy. Others 
are low in one and high hi another. This type of relationship 
is called correlation. The degree of correlation may be ex- 
pressed by a coefficient of correlation.' 

Table XXXI. Distribction or ScoHEa in HANDWHirifJo j 
poa A Fourth Ghade showing the Arrangement of Two j 
Kinds of Scores to show Cohrelation 



951 ^H 

thers 1 





QuaVi(!,«w«(Jyr«i«/r) 


i%4 




BO 


30 


40 


so 


60 


,0 


«, 


yo 




1 

1 
3 

» 


1 
2 

2 
S 


2 
6 


1 
1 

j 

3 


2 
1 

7 

8 


: 


: 


: 


1 










































Total for quality 


- 




- 


38 



Median: — Quality. 48; Speed, 63. 

Coefficient of correlation. If two seta of quantities a 
related in pairs so that when one is large the other is < 
large, and when one is small the other is small, there is said 
to be a positive correlation. If there are no exceptions to 
this relation, the correlation is called perfect, and the co- 
effinentis +1.00. If the two sets of quantities are related so 
that when one is large the other is small, the correlation is I 

' For the method of calculatini? the coeffioient of correlation the read^ J 
is referred \a E. L. Thorndike. Menial and Social MeaaUTements, pp. 150-83. 



853 EDUCATIONAL TESTS AND MEASUREMENTS , 

negative, and a coefiBcient of —1.00 indicates perfect nega- 
tive correlation. A coefficient of zero or near zero indi- 
cates DO correlation, that is, ao constant relationship exists 
between the two sets of quantities. In general, coefficients 
between +.S0 and —.30 should be interpreted to mean 
that no significant correlation exists. However, coefficients 
of correlation must be interpreted with care. 

For some purposes a distribution, such as is shown in 
Table XXXI, is more significant than the coefficient of 
correlation.. The study of such an array will reveal the 
general relation between the two abilities which prevails. 
If the scores are grouped near the diagonal from one corner 
of the table to the opposite one, a significant correlation 
exists. Sometimes this is brought out better by using a 
graphic distribution on a chart, rather than a statistical 
table form. This is well shown in Fig. 15. The college marks 
were determined by giving 6 for A, 4 for B, 3 for C, 1 for D, 
and for F, and then adding the marks for the courses. Thus, 
five A's would give a total of 30, five B's a total of 20, etc. 
Some correlation is shown, but it is not marked. 

In the chapter on arithmetic it was stated that each type 
of example requires a different ability. This is shown by giv- 
ing the same pupils two tests, each test being confined to a 
single type of example. When the two sets of scores are 
tabulated so as to show the degree of correlation existing 
between them, it has been found that little correlation exists. 
This means that in general the possession of the ability to do 
examples of a given type is no assurance that a pupil 
possesses an equal degree of ability to do examples of a differ- 
ent type. 



HP^B STATISTICAL METHODS 2^3 

quently aids one in interpreting a group of facta. The gen- 
1 eral principle involved is that numerical quantities are i 

I School Erades at emisnce | 

1 soesTTsassMBsioa I 






















































^l-'-'-'-'-"-'"u'W^M' 






il! '4rr^-- 






















































Fig. is. Showinq the Relation or Standing of 13S Colleob 
Stodbstb, in theib Entrance ExAMiNATioNa, to marks *ud« 
IN IHEUR FacaHMiN Yeah in College. 
{AllfrTborndike.) Esch dut repRKnti a itudint'i nrord. The college ourki vne 

drlninin«I by giving fl for A, t tor B, 3 for C. Hot D, 4nd Inr F. and Ui»n adding 

tit ma.l» far Ihc con™. Thu>. flv. A', would givt a toUl of M. fi« B>, » tolal ol 

to, etc. Some carRlaCioD a ihown, but thi> ii not marked. 

represented by the length of lines, distances from lines, or 
points, or areas, a certain length, distance or area having i 
been chosen as a unit, ' 
Certain types of quantities may be represented by a line. ' 
Take for example, the standards for Starch's SpelUng Tests. 
(See page 130.) They are represented graphically in Fig. 16 
by lines of appropriate lengths. This is often easier to grasp 
than a statistical form of statement. The meaning of the 



1 



Third andt v 

Faortb snde h 

FUth snde ■ 

Sixth mrade h 



Number at wocili HpeLLed correctly 
Fio. 18, A Gku-sical Rbprehemtation o? the Standaud Scores 

FOB STAECH'a SPELUNQ TeSTB. (Sm p. 130.) 





V 






























B 




Tl 








, 


^ 


-^ 












D. 


^ 












/' 


^ 














^^ 


/ 














J 


" 




1 1 


1 t 






\ V 


U V 


lU 



nhevl of the itaodatd tor thoj gnde; Ihoae below are behind. 




STATISTICAL METHODS 



r kligth of the lines is ascertained by comparison with the 

I scale at the bottom of the figure. 

Another plan for representing graphically the same facts 
is given in Fig. 17. The base line Ox and the vertical line 
Oy are lines of reference. In drawing this type of graph, 
points are located whose perpendicular distances from the 
lines of reference represent the pairs of values. The distance 
















A 






s 










f 






i" 

5 








/ 








p. 
.s 






; 










s'^ 






















c^ 












Fra. 18. 


Ebprese 


NTINQ G 

Han 


uredon t 


iO 
eAyrasS 

TTBE&l 


ale 


K) 9 



of the point C from the line Ox represents the number of 
words spelled correctly. The meaning of this distance is 
read from the scale on Oj. In the same way the distance 



856 EDUCATIONAL TESTS AND MEASUREMENTS -, 

of C from Oy represents the school grade. Thus, the positioD 
of the point C represents graphically the fact that the stan- 
dard for the third grade is 40 words. The representation 




stands out more clearly when the points are joined, as has 
been done in Fig. 17. 

Representing three quantities. In the case of handwriting 
there are three quantities to represent — speed, quality, 
and school grade. The standards for the Ayres Scale (see 
page 168) may be represented as in Fig. 18. The speed and 
quality are bath represented as well as the school grade. 
Roman numerab inside the small circle give the school grade 
to which the standards represented by the position of the 
circle belong. 

Representing many quantities. Fig. 19 illustrates a plan 



STATISTICAL METHODS 237 

of gmphical representation which is useful in represent- 
ing the standing of a pupil or a class in several teats. The 
scales on the several lines have been so chosen that the 
gtandajds for any grade he on a vertical line. All of the 
standards for these tests for the fifth grade lie on the line 
marked V, those for the sixth grade on the hne marked 
VI, and so on. The scores of a pupil or of a class are marked 
on the appropriate horizontal lines. When these points are 
connected, we have a graphic diagnosis of the abihty of the 
pupil or of the general ability of the class. 

QDESTIONS AND TOPICS FOE INVESTIGATION 

1. What U a distribution of scores? 

2. What is a central t«ndeDoyP 

8. Calculate bath tbe median and the average for a typical distribu- 
tion. Compare the two ceatrel tendencies. 

4. Do extreme scores affect the average? Do Ihey affect the median? 

5. Which of the three central tendencies is the most satisfnctoiy for 
general use? Why? 

0. Tabulate the bandwriting scores of a class to show the correlation 
between speed and quality. 

7. What are the advantages of graphical representation? 

8. Why is a measure of the variability of a distribution needed in ad- 
dition to tbe central tendency? 



CHAPTER IX 

TFIE MEANING OF SCORES 

Dldefiniteness of school marks. The result of measuring 
a physical object ia expressed in terms of certain units, such 
as the foot, pound, square yard, or bushel. Likewise the 
result of measuring the abilities of pupils ia expressed in 
terms of units. The speed at which a pupil is able to write 
is expressed in terms of letters per minute. The rate at which 
a pupil can add is exjjressed in terms of the ability to do a 
unit example. In the case of the Courtis Standard Research 
Tests, Series B, the unit addition example consists of three 
columns of nine figures each. The measure of an ability 
expressed in terms of a unit ia usually called a score. 

We apply certain descriptive terms to physical objects. 
By saying that a man is " tall" we express the fact that his 
height is greater than that of the average man. Since we know 
the height of the average man, a "tall" man is interpreted 
as one whose height is near six feet. However, a "tall" 
tree is not one which is six feet in height. A room eight- 
een by twenty feet would be a "large" kitchen, but a 
"small" classroom. What descriptive term or meaning is 
applied to an object depends upon a standard of size aa well 
as upon the magnitude of the particular object. 

School marks or " grades " are descriptive terms, similar 
to "tall," "large," "short," "heavy," and the like. Words 
such as "fair," "good," "excellent," "poor," "superior," 



THE MEANING OF SCORES 



and " medium" describe the attainments of pupils in com- 
parison with a standard. 

They do not express measures of pupils' abihties, but 
stead they are the meanings which teachers assign to meas- 
ures of the abihties of pupils. To say that a fourth-grade 
pupil is a "superior" reader indicates that he possesses a 
higher degree of ability to read than the average fourth- 
grade pupil, but it does not teLl how rapidly he reads, nor 
how well he comprehends. On the other hand, to say that 
this pupil reads a certain printed text at the rate of one hun- 
dred and sixty words per minute tells his rate of reading, but 
does not tell his standing as a fourth-grade pupil. His stand- 
ing as a fourth-grade pupil can only be learned by comparing 
his reading rate with the standard rate for fourth-grade 
pupils. 

School marks vs. scores. Usually a pupil's "grade" on 
an examination is the per cent of questions he answers 
correctly, credit being given for answers partly right. Thifl 
practice means that the number and difficulty of the 
questions represent the standard of attainment for pupils 
of the grade to which they are given. A "grade" of 87 
per cent placed upon an examination paper represents a 
comparison of the pupil's ability with the standard which the 
teacher established in the preparation of the examination. 
Alow "grade" may be due either to the pupil's lack of abil- 
ity, or to the high standard which the teacher set. Likewise 
a high "grade" may be due either to the pupil's exceptional 
ability, or to an easy examination. Thus " grades " expressed 
as per cents are not measures of the attainments of pupils, 
but only terms which describe pupils' attainments In com- 



ini'^^^l 
m- ^l 



960 EDUCATIONAL TESTS ASD MEASUREMENTS ' 

paiisoD with sUncUrds establishd by the teacher. To 
define a "grade" oi "excellent" as meaDing frtHn "95 to 
100 per cent" is simply to exchange one descriptive term 
for another. It does not define "excellent. " 

Teachers frequently fail to distinguish between scores, 
and school marks or " grades," This is especially true when 
the examination is considered to consist of 100 points, and 
the "grades" are expressed in terras of per cents, but the 
two are not the same and should not be confused. The rela- 
tion which exists between scores and " grades" may be shown 
by the following illustration. Suppose that a teacher of a 
seventh-grade class constructs an examination which she 
considers includes 100 units. In terms of the units of th \^ 
examination, the five highest scores out of a class of 30 pupils 
are as follows: 80, 77, 75, 74, 72, and the five lowest are 49, 
47, 46, 44, 41. Suppose, also, that in this school the school 
"grades" are to be reported in per cents, with 70 as the 
passing mark. Do these scores mean that only those pupils 
who have scores of 70 units or above are to be given passing 
"grades"? Certainly not. The scores in terms of units 
must be translated into school marks, but before this can be 
done, the basis of translation must be determined, and this 
basis involves knowing what scores seventh-grade pupils 
should make on this particular examination. 

Translating scores into school marks. The conditions 
described above are represented graphically in Pig, 80. 
The base line marked from 24 to 100 represents the scale 
of the test, the portion from 24 to being omitted because 
it is not used in the illustration. The five highest scores 
and the five lowest scores are marked on this scale. We know 



■ THE MEANING OF 


SCORES '■ 


!»1^^^| 


that when a sufficiently large number of pupils are measured, m 


with resi>ect to any 








ability, they will be 




, 


§ 


s e 
" 2 


^ grouped in the man- 




1 


S 


1' - ' 


ner indicated by either 












of the curves in the fig- 






g 


S E 


ure. There will be a few 




/ 


S 


1 


' i Ji 


with very high scores, 




/i 




£ 


sljl 


a few with very low 


. 


/ 1 


3 


■s 


L'l 


scores, and a large 


/ 


I'S 


S 




111 


niunber very near the 


/ 


/r; 






average of the group. 


/m 


' t 




■a 


"Is 


Knowing this fact, it 


(ijl 


it 


K 




is simply a question of 




S 




where this distribution 


\ / "~ 




-s 


1 1^ 


will be located on the 


\y 




3 




scale line in the caae of 


y\ 




S 




seventh-grade pupils. 


l.-i ^ 


\ 


s 


s 


11 


The solid line curve 
indicates one possible 


\ 


\I 


u 




location of the distri- 


\l 


3 


e 


"!il 


bution. After the loca- 


V 






1 SI 
S la 

ft! 


tion of the distribution 


\ 


, ■- 


3 




baa been determined, 
there remains the 




\ 




3 


placing of the passing 




' 


' 1 


mark in this location. 




«ii 


-8 'M 


This is an ai-bitrary 




% 


1* 

- 1 


matter, and any school 




1 

■ 


may place it where it 


^ 


Si' 



2«3 EDUCATIOXAL TESTS ANB MEASUREMENTS ~ 

desires. Assume that in this particulBr case it U located at 
48 on the scale of the test, .\gain this passing mark may 
be called anything as a school inark- Assume that in this 
particular case it has been called 70 in this school. Now, 
we have a satisfactory basis for translating the scores into 
school marks. The scores marked on the scale of the test 
are to be read from the solid line scale below. The pupil 
whose score was 49 would receive a school mark of 71. For 
only the four lowest scores a school mark of failure would 
be given. For the score of 80 a school mark of 88 would be 
given. 

If, however, the lof^tion of the distribution of the scorea 
of seventh-grade pupib should be that indicated by the 
broken line curve we would have a different basis of trans- 
lation. The school marks would be read from the broken 
line scale. For the lowest score of 41 a school mark of 75 
would be given. 

This illustration should make clear that scores are not the 
same as school marks, and should also indicate the informa- 
tion which is necessary for a teacher to have before she can 
make an accurate translation of scores into school marks 
or "grades." Only in the case of tests which have been 
standardized will it be possible for the teacher to use such 
an elaborate plan of translation. Tests constructed by the 
teacher are not standardized, but a crude standard dis- 
tribution may be obtained, as follows: Divide the test 
into convenient units or points. It is best to choose the 
unit, so that the total is not 100, but some other number 
as 28, 39, or 163. In reading the papers, assign scores in 
terms of this unit. If these scores are arranged in order 



THE MEANING OF SCORES 2CS 

the teacher will have a crude staadard distribution, and, 
by comparing individual scores with it, "grades" may be 
assigned more intelligently than by making no distinction 
between scores and "grades." 

A satisfactoiy standard. If many of the critics of the 
public schools are only partly right in their contentions, 
present conditions are far from what they should be. Thus 
we are not warranted in assuniing that standards which 
simply represent present conditions are satisfactory, and 
we may appropriately consider the basis of satisfactory 
standards. 

A satisfactory standard must he reasonable and must be 
"efficient." To be reasonable a standard must be such 
that it can be attained by pupils, under school conditions, 
and with an appropriate time expenditure. Pupils are limited 
in their learning by inherited characteristics, and all cannot 
attain to the same levels of skilL However, it must not be 
forgotten that the medians, or averages, of present attain- 
ments of pupils are far below the leveb of skill which have 
been attained by many pupils and adults. For example, 
the eighth-grade standard for tbe Courtis addition test is 
only eleven examples attempted. Some eighth-grade pupils 
are able to do twenty or more examples, and adults who have 
been specially trained have done fifty to sixty examples. 
In fact Courtis, who has studied arithmetical abilities for 
several years, states that it appears that there is no limit 
to the rate at which columns of figures may be added, pro- 
vided the amount of time for practice is unlimited. In the 
school the time for practice is limited, but even then the 
standard of eleven examples is markedly below the attain- 



264 EDUCATIONAL TESTS AND ME, 

ment of many pupils under present school conditions, and it 
still remains to be seen what degree of ability to perform the 
fundamental operations of arithmetic with integers might 
be engendered with the present time allotment, provided 
the methods and devices of instruction were properly suited 
to the pupils. It seems probable that the standards which 
we have now are below the level of the possible attainment 
of a large per cent of pupils under school conditions. Just 
how large this per cent will be when the methods and de- 
vices of instruction are appropriately adjusted to the pupils 
canbedeterminedonlyby actual trial. Itmay be that, when 
we learn to adjust our methods and devices of instruction 
better to the abihties of pupils, a higher standard of ability 
may be attained with a less expenditure of time and teach- 
ing effort. 

An efficient standard. The second qualification of a satis- 
factoiy standard is that it be " efficient." By this it is meant 
that the standard must represent a degree of ability which 
equips pupils for meeting present and future demands with 
a high degree of efficiency. The word efficiency has been bor- 
rowed OF rather adopted from the field of engineering and 
mechanics. The efficiency of a machine such as a steam 
engine is the value of the fraction whose numerator is the 
amount of work which the engine does, or its accomplish- 
ment or output, and whose denominator is the amount of 
energy put into it in the form of fuel. The value of this 
fraction may be increased in two ways: first, if the numer- 
ator, tliat is, the amount of work done, is increased without 
increasing the amount of energy put into the en^ne, or at 
least without increasing it in the same proportion; second. 



THE MEANING OF SCORES 265 

if the denominator is decreased without decreasing the num- 
erator in the same proportion. The most efficient machine 
is the one for which the value of this fraction is the largest. 

The word efficiency with essentially this meaning is now 
employed with reference to many forms of human endeavor. 
The numerator consists of the actual accomplishment. The 
denominator consists of energy and time which are put into 
the project, both in the form of preparation and in the actual 
doing at the time. For example, the contractor for a building 
erects a tower for hoisting and distributing the concrete used 
in the construction. He installs a rock crusher and other 
machines and appliances which will form no part of the com- 
pleted building. All of these things are a part of the expense 
he puts into the building. In addition, he puts a large quan- 
tity of labor into it. These two items, together with the 
building material, are measurable in terms of dollars and 
cents and constitute the denominator of the fraction. The 
product of his endeavor, the completed building, forms the 
numerator of the fraction. His efficiency as a contractor 
is represented by the ratio of these two quantities. 

He might have dispensed with the tower and have had the 
concrete distributed in wheelljarrows. He might even have 
dispensed with the rock crusher and mechanical mixer for 
the concrete. By so doing he would have eliminated these 
items of expense, but it is reasonably certain that if he had 
done so the total exj)ense of constructing the building would 
have been considerably increased by the added labor, and 
this would have decreased the efficiency of the enterprise. 
The total expenditure of time and effort or money is the 
sum of the expenditures for preparatory and accessory 



366 EDUCATIONAL TESTS AND MEASUREMENTS 

purposes, plus the expenditures of actual operation. In a 
large project, if the expenditiire for preparatory and accessory 
purposes is too small, then the operation expenses are unduly 
large, making the total larger than necessary, A high de- 
gree of efficiency demands that there be such an adjustment 
between the two as to make the sum as small as possible. 

Effort to be expended on the tool subjects. The school 
subjects are frequently classified under two heads; — tool 
subjects, and content subjects. The tool subjects of the ele- 
mentary school are reading, handwriting, the operations of 
arithmetic, spelling, and language. In the study of tlie con- 
tent subjects, such as the problems of arithmetic, hterature, 
geography, history, science, etc., the tool subjects are used. 
The situation with respect to school subjects is quite analo- 
gous to that of the illustration just cited. The tool subjects 
are used in further learning in school, and in practical activ- 
ities outside of school. Time and effort are required for 
acquiring skill in using these tools. Time and effort are nlso 
required when these tools are used. If oidy a small degreeof 
skill is acquired, the time and effort required for using the 
tools are greatly increased. For example, time and effort are 
required to leam to add. By increasing the amount of time 
for practice, the skill of the learner can be increased and 
the time required to add numbers in the solving of prob- 
lems and in the practical activities will be decreased. If the 
learner has a large amount of adding to do, it will be econ- 
omy for him to spend a relatively large amount of time in 
practice, that is, in preparation. If he is going to have only 
a few occasions to add numbers, it will not be economy ftw 
him to spend a large amount ot time in practice. 



THE MEANING OF SCORES 267 

The situation is precisely the same as that of the contrac- 
tor who was mentioned. When he is constructing a $500,000 
building it is economy of time and effort (both of wliich have 
a money equivalent) to spend several hundred dollars and 
several weeks of time in preparation for the construction of 
the building. If, however, he were building a $3000 resi- 
dence it would be folly to spend a very large amount in prep- 
aration for the work. So if the pupils in our schools are 
going to have many occasions to add in their future school 
work and in their activities outside of school, it will be 
economy to spend enough time and effort to engender in 
them a relatively high degree of skill. If, on the other hand, 
these pupils are going to have only a few occasions to add, 
it is folly to expend the time and effort to engender in them 
a high degree of skill. What is true of addition is true of 
the other operations in arithmetic and of the skills involved 
in the other tool subjects. 

School demands on the tool subjects. Some of the occa- 
sions for the use of these tools occur in the work of the school, 
and some occur outside of school in practical activities. It 
is generally conceded that these tools should be acquired 
in the first six grades. In the seventh and eighth grades and 
the high school pupils have many demands made upon them 
for reading, writing, spelling, the operations of arithmetic, 
and expression by means of language, both oral and written. 
Our manner of carrying on school work by the use of text- 
books and reference libraries makes the demand for reading 
very heavy. It is also our custom to require much written 
■work, prepared outside of the recitation period, and in some 
subjects much written work during the recitation period. 



A 



909 EDCCATHSf AL TBSIS ASD MEASTSEMESTS 

Thii cotoai aMke* beavy dfnttiids for wtt&^ spdEng, and 
writtea I i|in laiiw Id arithmrtic we expect pupils to learn 
lo aolve problems (not namplrwj hy aolvi^ problems. In 
lact n ere quir e them to solve maiyprobfana, and the solving 
of problems requires arithmetical operations lo be performed. 
In view of the fact that tbe school itself makes enonDoas 
demands upon its pupils for the use of these tools, it is folly 
not to prepare them adequately Im- these demands. It would 
be just as sensible for a contractor of a $500,000 building to 
fan to provide a mechanical mixer for concrete as for a school 
to fail to prepare its pupils to read nith an api»opriate speed 
and quality of comprehension. In the case <rf the contractor, 
failure to provide appropriate machinerj- means that the 
concrete must be mixed by means of back-breaking and 
time-consuming labor. In the case of the school, failure 
to equip the pupils properly to read means that the num- 
erous assignments which they will be asked to read will 
not only consume an enormous amount of time, but will 
also destroy interest in the school work because for them 
reading is a slow and difficult and hence a disagreeable 
process. 

Outside of school there are a number of demands for 
these tools which are common to all; — reading newspapers, 
magazines, and books; writing letters; expressing ideas; and 
solving simple arithmetical problems of everyday life. In 
addition to these there are a number of special demands 
which defjend upon one's occupation. Educators differ con- 
cerning the extent to which public schools should prepare 
pupils for these special demands, but practically all agree 
thut little differentiation should be made below the sev- 



F 

r enth gr 
I for thes 



THE MEANING OF SCORES 



enth grade, and, therefore, the question of preparation 
for these special demands concerns us but little in the con- 
sideration of standards for the tool subjects. 

Basis for standards of accomplishment. The demands of 
the school, and the comraou demands of life outside of 
school, are the requirements which are to be considered in 
the setting up of standards for the tool subjects in the 
elementary school. In general discussions concerning what ^ 
the schools should accomplish, and in practically all of 
the discussions of particular standards, attention has been 
focused upon the demands of life outside of school, and the ' 
demands of the school have been overlooked. This is per- 
haps due to the recent emphasis upon the fact that the 
function of the school is to give children preparation for the 
activities of life outside of school. This is a most wholesome 
and commendable point of view, but its acceptance should 
not blind one to the fact that the demands which the activi- 
ties of the school make for the use of these tool subjects 
exceed many of the demands which the common activities 
outside of school make. The average man or woman does 
not meet as pressing demands for reading as do the pupils 
in the high school. Likew-ise the demands for writing, and 
probably for the other tool subjects as well, which pupib 
meet in school are greater than they will meet outside of 
school. 

By saying that they are greater it is meant, not only that 
the demands are more numerous but also that they involve 
a limited time for their satisfaction. For example, when a 
pupil is given a test in school it is not only necessary that he 
write legibly, but it is also very necessary that he write 



«70 EDUCATIONAL TESTS AND MEASTREMENTS 



lably rapidly and without focasing his attention upon 
the act of writing. If he does not he is seriously hindered in 
i:n§wering the questions. 

Even if the tool subjects were not practical, it would be 
necessary for the school to teach them and teach them well 
in the first six grades, in order that the pupils might do the 
work of the following grades. There is no valid basis for 
the argument that, since few of the pupils will become 
bookkeepers or clerks, or enter other specialized occupations, 
emphasis upon definite standards of skill in performing the 
oi>erations of arithmetic and in handwriting is unjustifiable. 

It should, however, be recognized that in the ease of the 
content subjects the source of the standards of attainment is 
the demands of life outside of school. For example, if we 
were considering standards for the solving of problems in 
arithmetic instead of the doing of examples, the source of 
standards would be the demands which exist in the prac- 
tical activities of life. 

It should be evident from the foregoing discussion that a 
standard may be set too high. On the other hand it may be 
too low. Either condition means a low degree of efficiency, 
A teacher should not take pride in the fact that she has 
brought her pupils up to a point well above the standard. 
This condition may mean that she is just as inefficient as 
the teachers whose pupils are below standard, her inefficiency 
being due to an unusual expenditure of time for the engen- 
dering of this particular outcome. 

Types of standards. The measures of a class or other 
group of pupils taken as a unit are represented by the 
central tendency and variability of the distribution of the 



w 

I indi villi 



THE ME.\NING OF SCORES 271 

individual scores. In some classes we find a very great 
range of ability; in others it is very much less. The range 
of the distribution of tJie scores or the variability should 
be considered, as well as the central tendency. For giving 
meaning to these measures corresponding standards are 
required. In the absence of scientifically-derived standards 
the corresponding central tendency and variability of a 
large group of pupils are used. Another basis of interpre- 
tation is to use the corresponding measures of a number 
of similar groups. The position which a particular class 
occupies am.ong a number of classes gives a meaning to 
its scores. The comparison of the scores of a class with 
those for classes in grades above and below is often illum- 
inating, because the instruction of a school should be or- 
ganized so that the pupils' progress through the several 
grades will be systematic. 

Class scores are only indices of the general standing of 
the class. Since the teacher is instructing individual pupils, 
rather than a group of pupils taken as a imit, she needs to 
interpret the score of each pupil. Averages or medians alone 
are insufficient to do this because pupils differ widely in 
ability. To interpret fully individual scores it is necessary 
to have a standard distribution of scores. The meaning of 
a score depends upon its position in this standard distribu- 
tion. The method of doing this ia shown in the illustration 
on page 260. 

QUESTIONS AND TOPICS FOR INVESTIGATION 

1. What iathe distinctioo between "scores" imd "school mark^' ? 

£, Wliat thiDgs must be knowu in ordec to translate scores into school 




272 EDUCATIONAL TESTS AND MEASUREMENTS 

3. What do you think of the plan of placing scores on pupils* report 
cards instead of school marks? (It is assumed that if scores are used 
the standards will be given also.) 

4. Why are school mari^ such as "92 per cent,'* "good/* "excellent**' 
and the like, indefinite? 

5. Why are standards necessary? 

6. Why are ava'ages and medians not satisfactory standards? 

7. How must satisfactory standards be determined? 

8. What is the meaning of "efficiency ** in education? 

9. How is a pupil*s score interpreted? 

10. What different types of standards are available? 

11. What is a "standardized** test? 



CHAPTER X 

THE DERIVATION OF TESTS, AND EXAMINATIONS 

Analyzing pupils' class work. In constructing an iostm- 
ment to measure the achievements of pupils, tlie first step 
is to secure the exercises or specimens of pupils' productions 
which shall constitute the test or scale. Different school sub- 
jects present different problems, but in all cases this first step 
involves the analysis of the subject-matter, to determine the 
achievements which are fundamental. A brief account of the 
necessary analysis for certain subjects has been given in 
the preceding chapters, under the head of the "problem of 
measurement" and in the description of certain tests and 
scales. The method of analysis for the operations of arith- 
metic is well illustrated in the derivation of the tests for 
addition of fractions. (See page 34.) Upon the hypothesis 
that each type of example requires a specific ability, the 
addition of two fractions was analyzed to determine the dif- 
ferent types of examples which might arise. This analysis 
was verified by a study of the ability of pupils to do the 
several types of examples. 

In the field of spelling a different type of analysis has been 
made. By examining the writings of children and adults, 
Ayres and others have determined the most frequently used 
words. Starch analyzed the non-technical words of the Eng- 
lish language so as to secure random samples for his spelling 
tests. In addition it has been shown that the ability to spell 



4 



r 



i7i EDUCATIONAL TESTS AND MEASUREMENTS 

words in dictated lists is not the same as the ability to 
spell the same words when they are given in timed sen- 
tences. 

Exercises or specimens of pupils' productions are only the 
ci'ude materials out of which a nieasuring instrument is to 
be constructed. The measurement of quantity requires a 
unit. For measuring length we have the yard and meter; for 
measuring weight, the poiind and gram; for measuring time, 
the second. For measuring the abilities of pupils to perform 
the operations of arithmetic, to write, to read, to translate 
Latin, and the like, units of elemental abilities are necessary. 
The abilities of pupils arc measured by having them do cer- 
tain exercises, or by comparing specimens of their work with 
specimen.s of known value. The exercises or specimens of 
pupils' work which constitute a test or scale must be evalu- 
ated in terms of a unit. 

Bases for evaluating pupils' worfc. The value of an exer- 
cise or specimen may be considered from the standpoint of 
its importance or of its difficulty. The time required in doing 
an exercise is a third factor, if difficulty is defined in terms of 
the per cent of correct answers. It is not possible to meas- 
ure objectively the importance of an exercise. It is an easy 
matter to secure a measure of the difficulty of an exercise by 
having it given to a large number of pupils. The per cent 
of pupils who solve it correctly is an index of its difficulty. 
For this reason exercises are generally evaluated on the 
basis of their difficulty. Specimens of pupils' productions, 
as in the case of handwriting or English composition, ob- 
viously cannot be evaluated on the basis of difficulty of 
production. In constructing scales for these subjects. 



TESTS AND EXAMINATIONS «75 

specimens have been evaluated on the basis of the consen- 
sus of opinion of competent judges. 

The nature of the subject- matter of arithmetic is such 
that it ia possible to construct examples which aje approxi- 
mately equal in difficulty and hence represent equal degrees 
of ability. A multiplication example consisting of a four 
figure multiplicand and a three figure multiplier is approxi- 
mately equal in difficulty to any other multi plication ex- 
ample similarly constructed. Hence, in arithmetic, each 
of the exercises which compose the test may be constructed 
equal in difficulty, thus avoiding the necessity of evaluating 
them. However, with the partial exception of algebra, this 
ia not true of the other school subjects. 

The cycle principle. In constructing the Standardized 
Algebra Tests, Rugghaa employed the " cycle principle " of 
rotation.' This principle is employed when it is desired to 
include two or more types of examples in a single test. Ac- 
cording to this principle the examples are arranged so that 
each type of example recurs at a regular interval. The cycle, 
instead of a single example, is the unit. Thus the necessity 
of evaluating the exercises in terms of a common unit is 
avoided. 

The per-cent-of-pupils-solving basis. The most fre- 
quently used method ° of evaluating the exercises of a test 

' Rugg, H. O.. and Clark. J. R. "Standanlized Teats and the Improve- 
ment of Ttachiog ia First-Yeur Algebra"; in School Renciv, vol. 26, 
pp. 113-32. 

* A detailed descriptioa of this method, applied U> a particular sub- 
ject, is given in each of the following monographs: 

Buckingham. B. R. SpfUing Ability ; Ita Mearurement and DUtnlmtion. 

Tralnie, M. R. ComyUion-Teit Lanjuage Scaleg. ' 

Woody, Clifford, MeaturemenU of Some AckinemcnU in Arilkmetio. 



y 



476 EDL'CATIOX.\L TESTS AND MEASXJEIEMENTS 

is to Lave them given to a large number of pupib. The rela- 
tive difficulty of exercises depends upon the per cent of re- 
sponses which are correct, but tlie degree of difficulty is not 
, proportional to the per cent of correct responses, For exam- 
ple, suppose exercise A b answered correctly by 40 per cent 
of a group of pupils, and exercise B by 80 per cent of the 
same pupils. Then exercise A is more difficult than B, but 
not necessarily twice as difficult. This is due to the fact that 
children do not exhibit uniform differences in ability, 

A nonnal distributioii of ability. It is a well-known fact 
that when a group of pupils Is measured with respect to a 
mental or physical characteristic they are found to be dis- 
tributed as shown in Fig. 21. The pupils near the median 
ability-group differ only slightly hi ability, while those at 
either extreme exhibit large differences in abihty. The de- 
gree of difficulty of an exercise which is done correctly by 
80 per cent of the pupils is represented by the position of A 






in Fig. 21. The degree of difficulty of an exercise, answered 
correctly by 40 per cent of the pupils is indicated by the po- 
sition of B. The degree of difficulty of exercises which were 



TESTS AND EXAMINATIONS 277 

done correctly by 90 per cent and 50 per cent of the pupils 
is indicated by the position of C and D respectively. The 
difference in degree of difficulty which is represented by 
the difference between 90 and 80 per cent of correct responses 
is obviously not equal to the diiTerenee in degree of difficulty 
which corresponds to a difference between 50 and 40 per 
cent of correct responses. This illustrates the relation be- 
tween the degree of difficulty of an exercise and the per cent 
of correct responses. Tables ba.ve been prepared to show the 
degrees of difficulty, corresponding to the various per centa 
of correct responses.' The degree of difficulty is expressed 
in terms of the probable error (P.E.). (See page 249 for 
definition.) The P.E. is used as the unit, since it b a con- 
stant function of all normal groups. 

Points to be considered in evaluating exercises. In trans- 
lating the per cent of correct responses into P.E. equivalents 
the values are counted from the median of the distribu- 
tion. To express the absolute value of an exercise a zero 
point must be established. This is done by constructing an 
exercise which calls for zero ability. The other exercises are 
then compared with this one. For the method used see the 
work of Buckingham, Trahue, or Woody. 

Exercises which are foreign to the experience of pupils 
do not furnish a satisfactory measure of their ability. Exer- 
cises which are suitable for eighth-grade pupils may not be 
appropriate for pupils in the fourth or fifth grade. In con- 
structing a test which is to be used in several grades it is 
necessary to reject those exercises which are foreign to the 
experience of the pupils in any of those grades. In deriving 
' See any one of the monographs referred to above. 



278 EDUCATIONAL TESTS AND MEASUREMENTS 

hia arithmetic scale Woody selected only those examples 
whicfi were done correctly by a gradually increasing per cent 
of pupils in the grades from two to eight. 

In evaluatiag the exercises of the Kansas Silent Reading 
Tests, Kelly considered the time taken by pupils in doing 
them, as well as the correctness of their answers. The aver- 
age number of seconds required to produce a correct answer 
was obtained by dividing the total time used by the number 
of correct answers. The values of the exercises were made 
proportional to the average time required to produce a cor- 
rect answer. 

Opinion of competent judges. Killegas used the con- 
sensus of opinion of "competent" judges in evaluating the 
merit of pupils' compositions, in constructing hia composi- 
tion scale. This method has been used also by Thorndike, 
in constructing scales for handwriting, drawing, and alge- 
bra. The method is based on the theory that differences of 
merit or quality which are noticed equally often are equal. 
Suppose a set of compositions. A, B, C, D, E, etc., are ar- 
ranged in order by 100 competent judges. Suppose B is 
ranked better than A 75 times and poorer than A 25 times; 
suppose C is ranked better than B 75 times and poorer 
than B 25 times ; suppose D is ranked better than C 75 times 
and poorer than C 25 times. Then the difference in merit 
between B and A ia equal to the difference between C and 
B, and is equal to the difference between D and C. Or, in 
other words, the successive differences in merit between 
the four compositions A, B, C, and D arc equal. By hav- 
ing a large number of specimens ranked in order by 
petent judges, it is possible to select a sufficient number 



er of 



TESTS A\D EXAMINATIONS 



279 



specimens representing equal differences of merit to form 
a scale. Thomdike recommends that the difference which 
is noticed by 75 per cent of the judges be used as a 
miit.' 

The teacher-judgment basis. Ballou used the teacher- 
judgment basis in constructing the Harvard-Newton Com- 
position Scale, but employed it in a different manner. In- 
stead of having the compositions ranked singly in order of 
merit, he had numerical values assigned to each by each 
teacher, following certain directions. The average of these 
values was taken as the true value. 

Reliability important. If a test or scale is to be effective as 
an instrument for educational diagnosis, it must yield re- 
liable measures of definite abilities. The definitcness of the 
measures is determined by the selection of the exercises. If 
the exercises are restricted to a single type, as in the Courtis 
Standard Research Tests, Series B, the scores will have a de- 
finite meaning. The reliability of the scores depends upon 
the method of using the test or scale. Shght variations in 
the time allowed, or even in the manner the pupils are ap- 
proached, affect the scores. Scares are also affected by the 
plan employed in marking test papers. If the exercises ad- 
mit of only one correct answer, and if they are marked either 
right or wrong, no variation is possible. But if it is left to the 
teacher to judge the merit of the answer, as is the case in cer- 
tain reading tests, there is certain to be a large degree of 
variation. Since scores are given meaning by comparing 
them with standards, it is necessary that the scores be ob- 

I For details of Ihe mtlhod see Thomdike, E. L. Mental and Social 
Meamremeniii, iip. 122 and 121. 



4 



k 



280 EDUCATIONAL TESTS AND MEASUREMENTS 

tained under the same conditions. Thus a vital feature of 
a test or sc&le is definite directions for using it. 

The use of tests and scales must not be too complex or make 
unreasonable demands upon the teachers' time. Tests which 
can be administered to only one pupil at a time require too 
much time. The plan for scoring the test papers may be so 
complex that it requires an unreasonable amount of time. 
Individual scores must be tabulated to determine class 
scores. For this purpose tabulation sheets must be provided 
unless it is left to the teacher to construct her own. 

Making examinations more effective. In Chapter I it was 
shown that the measures obtained by giving ordinary ex- 
aminations were unreliable. It is possible for teachers to 
increase greatly the reliability of these measures. A sys- 
tematic plan for marking examination papers will materi- 
ally reduce this source of error. Kelly '■ describes the fol- 
lowing experiment. Six fifth-grade teachers gave a uniform 
examination in arithmetic to their pupils. Each teacher 
marked the papers for her own pupils, but did not record 
the marks on the papers. The superintendent asked a 
teacher who was unusually systematic in marking exami- 
nation papers, to prepare a definite plan for marking these 
papers. After she had done so, she marked all of the papers 
in accordance with this plan. Then the teachers who had 
first marked the papers marked them a second time follow- 
ing her plan. This provided two marks for each paper by the 
classroom teacher, the first without following a systematic 
plan, and the second using a definite plan. Each of these 
marks was compared with the mark of the teacher who 
' Kelly, F. J. Teachers' Marks, p. Si. 



1 




TESTS AND EXAMINATIONS 281 

marked all of the papers. In Table XXXII the six teachers 
are designated by the letters A, B, C, D, E, and F, The table 
is read as follows: when no systematic plan was followed, 
teacher A marked one paper 16 to 20 points lower than the 
"judge," one paper 7 points lower, two papers 4 points 
lower, two papers 2 points lower, agreed with the "judge" 
on one paper, etc. The differences between the marks given 
when the classroom teachers had no standard or systematic 
plan and when they followed a standard are very striking. In 
the first instance the marks assigned by the teachers agreed 
with those assigned by the " judge" in only 5.5 per cent of 
the cases, while in the second instance they agreed in 63.5 
per cent of the cases. 

Care in framing questions. Beliability in marking exam- 
ination papers also can be increased by exercising care in 
constructing the questions. Frequently the same purpose 
can be realized by forming the question so that only one cor- 
rect answer is possible. 

The error dne to the unequal value of the questions can 
be reduced. Obviously it is impossible for a teacher to em- 
ploy the elaborate method described in this chapter for 
evaluating the questions, but approidmate relative values 
can be determined from the ratio of the number of correct 
answers to the number of wrong answers. Even if this is - 
done in only a very crude fashion, it is worth while. 

The rate at which the pupil works should be recognized, 
particularly when the exercises call for automatic responses. 
When the pupil is required to reason out his answer, the time 
he takes is not so important, but probably should not be 
enlirely neglected. It is very easy to measure the rate at 



282 EDUCATIONAL TESTS AND MEASUREMENTS ' 

Table XXXII. DiSTRmuriONa of Differences between Two 
Teachers' Marks on Sets of i'lFTH-GaADE Abitbmetic 
Papers — First, without amy Effort to dnut the 
Methods used, and Second, bt a Coumon Standard 

(after Kelly) 




Wilh<»d,U«lliOTd 


,F^-,d- 


" 


B 


C 


D 


E 


F- 


TtOai 


' 


B 


C 


D 


E- 


J? 


To,.! 


aiorm™ 

IBIaW 

w.:::.'".'. 
9. 

I:::;::-.; 

i.'.'.'.'.'.'.'.'. 
i.'.'.'.'.'.W. 

1 

3.'.'.'.'.'.'.'.'. 

4 

B 

s";:::::; 
.as::::::::: 




1 


2 
2 


2 
■3 






I 


\ 


I 


4 

1 


"1 


» 


1 

3 


"2 
19 

13 

> 


35 


•J 


+1 


+6 


s 


s. 


?! 


» 


« 


" 


" 


3. 


33 


=» 






J 


^^^^ 


A" 




k 


^^^^^H 



r 



TESTS AND EXAMINATIONS 288 

which the pupil works as well as his accuracy. This may 
be done by timing each pupil while doing a uniform amoimt 
of work. It may also be done by having all pupils work at a 
test a given number of minutes. From the amount of work 
each pupil does his rate of working can be computed. 

The indefiniteness of the meaning of examination marks 
can further be reduced by confining the questions to one or 
two topics, or to a small group of closely related topics. 
Sometimes, too, it will be advisable to give a series of short 
examinations, rather than a single long one. 

QUESTIONS AND TOPICS FOB INVESTIGATION 

1. What does the problem of measurement involve? 

S. What problems are involved in the ccmati-uction of a teat? 

8. What methods have been used in determining the value of exen 

4. What are the objectiona to the method of "opinion oF competent 1 
judges" f I 

5. How may examinations be made more accurate measuring instru- I 
ments? I 

6. Do you think that examinations have functions other than that id 1 
tneasuring the abilities uf pupils? If so, what are they? 

7. Repeat the experiment descrilied on page 280 and compare your re- I 
aulU with those given in Table XXXIl. 



USE OF STANDARDIZED TESTS IN THE SUPERVISION 
OF INSTRUCTION 

In the preceding chapters standardized testa and scales 
ha^-e been considered primarily from the point of view of 
tfaeir usefulness to the teacher. They are also valuable to 
the supervisor. It is the problem of this chapter to set forth 
the supervisor's relation to the use of standardized tests and 
scales, and how they may be used by him in the supervising 
of instruction. 

Assisting teachers. The supervisor (superintendent, 
principal or special supervisor) must assume the responsi- 
bility of educating the teachers of a school system in the 
making and using of educational measurements. This phase 
of educational development is so new that few teachers have 
become acquainted with it in the course of their professional 
training. It is important that the teacher think of test? and 
scales as instruments which will enable her to make her in- 
structional efforts more effective. There is considerable evi- 
dence to show that if a teacher misinterprets the function 
of tests or looks upon them with suspicion, her efficiency 
as an instructor will be lowered by their use. 

Some supervisors claim that the scores obtained by the 
use of standardized tests are an index of the teacher's 
efficiency. If the scores of her pupils are below standard, 
the teacher's efficiency is low. If they are above standard, 
the teacher's efficiency is high. Under certain conditions 




TESTS AND SUPERVISION S85 

this is probably true, but under the conditions which fre- 
quently prevail this conclusion cannot be justified. The effi- 
ciency of a teacher must be judged upon the basis of the 
growth which pupils make under her instruction. The low 
scores of a class at a given time may be due to the ineffi- 
ciency of their present teacher or of their former teacher. 
Also, due consideration must be given to the quality of 
supervision, the specifications under which the teacher ia 
working, the equipment with which she is supplied, and the 
native ability of the pupils. 

The supervisor should assume the initiative in selecting 
suitable tests and scales and in planning their use. The 
criteria which should guide the selection have been given on 
page 85, Measurement at the beginning of the school year 
furnishes an inventory of the situation which the teachers 
and the supervisor face. Measurement at the close of the 
school year will reveal the extent to which they have met the 
requirements of the situation. In addition, measiu^ment in 
September provides a check upon the work of the previous 
year. If a teacher has secured high scores in May by unfair 
or unwise means, this fact will be revealed when the pupils 
are tested again in September. 

Four steps in supervising instruction with tests. In the 
making and using of educational measurements four steps 
may be recognized. First, giving the tests; second, tabu- 
lating the scores and calculating the central tendencies, 
variabilities, etc.; third, interpreting the scores; fourth, 
modifying instruction to meet the needs revealed. The 
supervisor can render valuable service in each of these steps. 
In a number of cities this service has been considered 



L. 



r 



S86 EDUCATIONAL TESTS AND MEASUREMENTS 

important enough to justify the creation of a department of 
"educational research and eificiency." This has not been 
confined merely to the larger cities, but has been done in a 
few cities of 20,000 to 25,000. In a few cases similar depart- 
ments have been created by county superintendents. 

Giving the tests. The manner in which the test is pre- 
sented to the pupils affects the scores. The purpose of meas- 
urement is defeated if the test is presented to the pupils 
in such a way that their response is mmatural. For example, 
in handwriting, if the pupils write at an unnatural speed the 
quality of their handwriting will be affected. It is also neces- 
sary that there be uniformity in making the measurements if 
comparisons are to be made. The supervisor should train the 
teachers to give the tests properly, or arrange to have some 
trained person give them to all classes. 

Tabulating the scores. There must be uniformity in the 
tabulation of the scores. If directions and blanks for tabu- 
lation are not furnished with the tests chosen, the super- 
visor should decide what directions are to be followed. If 
the tabulation requires much time it is probably unwise to 
require the teachers to do it. Substitute teachers, and in 
some cases normal -training students, can be utilized for this 
purpose if trained clerical help is not available. In addition 
to the tabulations which may be made by the teachers, the 
supervisor should make others to show the situation for the 
school as a whole. Such tabulations are helpful not only to 
the supervisor, but also to the teacher. She needs to see her 
work in relation to the work of the school as a whole. 

Interpretation of the scores. The interpretation of scores 
has been treated in Chapter IX, and hence can be passed over 



TESTS AND SUPERVISION 287 

with a bare mention here. The supervisor should assume 
responsibility for making the teachers aware of the com- 
plete significance of the informatioa which the tests have 
provided. It is his function to provide standards and simi- 
lar scores from other cities, if they are not easily accessible 
to the teachers. The significance of a group of facts is more 
easily grasped when they are represented graphically. The 
supervisor can render valuable service to the teachers by 
preparing a chart or series of charts to show the standing 
of the several grades of the school in comparison with the 
Standards. Several schemes of graphical representation 
have been illustrated in Chapter VIII. 

Remedial treatment. The fourth step, modifying instruc- 
tion to meet the needs revealed, is the culmination of the 
first three. Without this step standardized tests become 
mere "playthings" and their use cannot be justified. The 
omission of this step creates a. situation similar to that 
which would exist if a physician examined a patient care- 
fuUy and determined the natiu^ of hia ailment but did not 
prescribe any remedial treatment. In our zeal to convert 
teachers to the acceptance of the principle that the measure- 
ment of certain results of instruction is possible there has 
been a tendency to overlook this step. In fact some have 
even said that they were conteat to apply the tests and re- 
veal to the teachers the shortcomings of their work. These 
persons would leave to the teachers the difficult problem 
of remedying the defects. As a result not a few teachers 
have failed to see in the tests anything more than a new 
"plaything," which they might use to secure material for 
a paper to read at a teachers association or to arouse the 



ess EDUCATIONAX TESTS AND MEASUREMENTS 

interest of their pupils. Such teachere have expressed their 
upproval of the tests when their pupils' scores were high, and 
have considered the teats unsatisfactory when the scores 
were low. 

Standardized tests are not " playthings." Neither are they 
teaching devices. They are instrnmenta whose function 
is to reveal the conditions which exist so that the teacher's 
efforts to instruct her pupils can be made more effective. 
This function is not fulfilled until this fourth step is taken. 
In securing thb fulfillment the supervisor renders significant 
assistance to his teachers. He can suggest remedial meth- 
ods and devices. Numerous remedial devices have been 
mentioned in Chapters II to VII. Another device which has 
been tried with at least some measure of success is to make 
a special classification of the pupils on the basis of their 
ability, as shown by a standardized test, for the purpose of 
instruction in subjects such as spelling or handwriting.' 

Teachers need detailed and definite ^lecific&tions. De- 
tailed and definite specifications of the product are necessary 
for efficient work in even simple undertakings. It is par- 
ticularly important in the education of children. Providing 
these specifications is the duty of the supervisor. Ekluca- 
tion consists of producing changes in children. When they 
enter the school they represent "raw" material which is to 
be modified by a series of workers (teachers). By means of 
tejrt-books, and the other material equipment, and by means 
of assignments, questions, and directions, teachers cause 
children to acquire habits, knowledge, ideals, and other less 

' Hoggerty, M. E. "Sorae Uses of Educational Measurements"; in 
Sclumi and Sodety, vol i, pp. 7S1-T1. 




TESTS AND SUPERVISION 889 

tangible controls of conduct. In the production of material 
things it is a cardinal principle that, "When the material 
which b acted upon by the labor processes passes through a 
number of progressive stages on its way from the raw ma- 
terial to the ultimate product, definite qualitative and quan- 
titative standards must be determined for the product at 
each of these stages." ' 

This principle applies also to the school. If the school 
is conducted in an efficient manner it is necessary that each 
teacher know in detail: (1) the changes which have already 
been produced in the partly educated children that come to 
her, and (3) precisely what changes she b expected to con- 
tribute to their education. Otherwise no teacher can pro- 
ceed with her work in an intelligent manner. For example, 
the teacher of reading needs to know what rate of reading 
and what degree of comprehension she can expect of het 
pupils in September, and exactly what the school expects her 
to contribute to the reading ability of these pupils by June. 

Courses of study represent working specifications. Our 
present courses of study represent the efforts of those occu- 
pying supervisory positions in our school systems to provide 
teachers with specifications for their work. How well they 
have succeeded is illustrated by the following quotations 
from typical courses of study. 

FouHTH Grade 

Reading and literaluTe. Stories read and told lo the class; Roman 
stories, American history stories relating to geography, selections 
from Greek and Teutonic mythology, and poems. 

' Bobhitt. Franklin. "The Supervision of City Schools"; in Ttndflh 
Yearbook of the National Societj/ for the Study of Educaiion, part i. 



I 

4 



290 EDUCATIONAL TESTS AND MEASUREMENTS ' 

A tew choice selections of appropriate prose and poetry a 
be studied, committed to memory, and recited or dramatized. 
See that the pupil stands on both feet and reads smoothly and 
confidently. Watch the voice of pupils; use breathing e 
and avoid harsh, strained reading. Have })upils read many select 
tions silently, then reproduce the thought aloud in order to develop 
the power of gaiaing and expressing the thought of the text. Aim to 
enlarge the pupil's vocabulary, to help him master the thouf;ht 
content, and gain the power to read in a pleasing, well -modulated 
tone. Use much supplemental reading. Explain the purpose of 
the children's department in the Public Library, and encourage 
pupils to read books therefrom. 

Following these general directions, the selections to be 
read are specified. 

Thied Grade: B Class 

Handwriting. Daily drill, lessons five, six, or seven.' During 
the entire year these drills should be used for a few moments at 
the beginniug of each writing lesson. 

Beginning with lesson five, taJce the lessons in consecutive order 
to lesson thirty-five. 

After developing a letter with the class take, from the writing 
book, a word beginning with the same letter and use it for practice. 

If the letter is a capital, follow the word practice by using a sen- 
tence beginning with this letter. All words and sentences should be 
taken from the writing liook. 

Grade 4 B. Arithmetic 

Leading topics. The four fundamental processes with emphasis 
upon multiplication. 

Review. Begularly, constantly, and from the first. The addition 
combinations, subtraction, reading and writing numbers, simple 
tractions. 

MuUijAieation. The tables completed and made automatic. 
Problems with two-place multipliers. Bapid oral practice. 

Dwirion. Short division with long division brace. Rapid oral 

' Reference to a Mamiatfor Teachers, used in Ihis school syltem. 




TESTS AND SUPERVISION 

Fractions. Simple fractions and mixed numbers as needed 
actual practice on concrete form problems. Largely oral and ob- 

Concrete problema. One-step problems. 

Applied problema. Farm products, farm marketing, farm profits. 

MeasoTEs. Quart, gallon, peck, bushel, pound, ton, cord, etc. 

as required by applied problems. 

Such subject-matter directions not quantitative. These 
specifications are in terms of subject-matter rather than 
results. Subject-matter is a way of acting.' A selection to 
be read is a requirement for a certain activity on the part 
of the pupil. Examples and problems in arithmetic are 
challenges to action. Drill exercises in handwriting call for 
motor activity. This activity is not the end to be attained 
but the changes which are produced in the child by hia ac- 
tivity. Questions, drills, assignments, and the like serve to 
spur the child to action. As a result of his activity, motor, 
intellectual, or emotional, the child acquires habits, knowl- 
edge, and ideals. These are the ends for which the school 
exists. 

Children differ in many ways. They differ widely in the 
amount of action which is necessary in order that they may 
acquire a given change. This fact is evident in any class- 
room. Under our present methods of teaching the same 
work is required of all pupils, and as a result we find that 
the pupils of any grade differ widely in their ability to do. 
One pupil learns to add twelve examples in eight minutes 
with 90 per cent of accuracy. Another pupil who has been 
subjected to the same training adds only seven examples in 



I 



> See Chartera. W. W. Methods of Teaching, cbapter 
UoD of this definilion of subject-matter. 



n elabor^ J 



262 EDUCATIONAL TESTS AND MEASUREMENTS 

this time, and only four are correct. One pupil learns to write 
at the rate of eighty-five letters per minute, and with a qual- 
ity of 80 on the Ayres Scale, With the same training another 
pupil is able to wTite only sixty letters per minute, and with 
a quality of 50. Efficient teaching requires differentiated in- 
struction rather than the same instruction for all pupils. 
This has been emphasized in the preceding chapters under 
the head of "remedial instruction." By stating the di- 
rections to the teacher in terms of subject-matter it is sug- 
gested that all pupils be given the same training, regardless 
of whether they need it or not. Since pupils differ widely, 
their needs cannot be the same. 

Such directions lead to f oimal and uniform instruction. 
This emphasis upon subject-matter leads to formalbm. 
Because the teacher is told to use certain subject-matter 
rather than to accomplish specified results, the use of sub- 
ject-matter becomes her purpose. In teaching reading her 
purpose is to have the pupils read the required selections and 
books. In teaching arithmetic it is to have the pupils do the 
examples and problems on certain pages. In teaching geog- 
rapliy it is to cover the specified topics. In manual training 
it is to complete the designated projects. The doing of these 
things has no importance except as the pupils are educated 
by reason of the activity. By transferring the purpose of the 
teacher, and through her the purpose of the pupils, from the 
end to be attained by the use of subject-matter to the use 
of subject-matter itself, the work of the school becomes 
formal. 

When the ends to be attained are mentioned in courses 
of study the terms used are generally indefinite, "The mul- 



TESTS AND SUPERVISION 

tiplication tables completed and made automatic " is indefi- 
nite because there are many degrees of automatization. The 
degree of ability that one teacher would call "automatic" 
would not be accepted by another. " Rapid oral practice " 
is likewise indefinite. A definite statement could be made 
easily by specifying the rate in terms of responses per minute. 

The tests aim to introduce quantitative work. The stand- 
ards for scientifically devised tests define the achievements 
of pupils which are to be attained by the use of subject-mat- 
ter. They make possible the writing of a course of study in 
terms of the results to be attained at each stage of the pupil's 
progress. For example, in handwriting the rate at which 
the pupil is expected to write and the quality of his writing 
are specified for each grade by the standards given in Chap- 
ter V, The abiUties which the pupil is expected to exhibit 
in performing the operations of arithmetic are defined by 
the standards given in Chapter II. When the specifications 
for each stage of the work are expressed in terms of estab- 
lished standards the teacher knows what to expect of the 
children who come to her at the beginning of the year, and 
also what she is to contribute to their education. With her 
attention directed to the results to be secured rather than 
to the subject-matter to be used, the teacher has an oppor- 
tunity to exercise her resourcefulness in using subject-mat- 
ter as a means to that end. 

Ability to do automatically is specific. The ability 
spell "mountain" is distinct from the ability to spell "suc- 
cess." The ability to add a column of four figures is not the 
same as the ability to add a column of fifteen figures. The 
abihty to add two fractions n'ith unlike denominators is 



ic- 1 



894 EDUCATIONAL TESTS AND MEASUREMENTS | 

not the same as the ability to add fractions with common 
denominators. A multiplicity of abilities must be engen- 
dered. The teacher must be made conscious of each ability 
as an end to be attained by each pupil under her tuition. 
Since the teacher is at all times concerned with the details 
of teaching, the general aims of education are not sufficient. 

The inadequacy of general aims of education is well illus- 
trated by the fact that frequently a teacher fails to recognize 
the existence of certain important details. Recently the 
writer found the handwriting of a certain supervisor of hand- 
writing very difficult to read. An analysis showed that this 
was due to the lack of sufficient spacing between words. An- 
other supervisor admitted that she had never thought of 
sjjeed of handwriting as a factor in the aim of the teacher. 
Many teachers give evidence that their aim of teaching 
reading includes only that of oral reading. 

Tests introduce scientific management. The principle of 
scientific management stated at the beginning of this chap- 
ter implies the material being acted upon must be tested 
at regular interi-als in order to ascertain if the specifications 
are being met. By using standardized tests at regular inter- 
vals it is possible for a supervisor (superintendent, principal, 
or special supervisor) to know how teachers are meeting 
the specifications. This information is also valuable to the 
teacher. The use of standardized tests is most effective when 
teachers sympathetically cooperate with tlie supervisor in 
using them. The first effort of the supervisor should be 
to secure this cooperation. Independent use of tests by 
the teachers does not yield complete returns in increasing 
the efficiency of the school. 



r 



TESTS AND SUPERVISION 295"* 

The time given to a school subject has a value in dollars 
and cents. Economy demands that no more time be given 
to a subject than is necessary for the pupils to attain a satis- 
factory standard of achievement. In the absence of definite 
standards school time has been allotted to the several sub- 
jects according to the opinion and interests of the supervisor. 
As a result the amount of time given to the several subjecta 
varies widely.* 

Handwriting an example of wasting time. Investigation 
has shown that no relation exists between the time expended 
and the results obtained. The condition for handwriting in 
forty-seven cities is shown in Fig. 22. Similar conditions 
have been shown to exist in other subjects. 

The comparison in the rank of schools which spend different 
amounts of time upon writing is shown in Fig. 32. Each vertical 
line in this figure represents one city. The hnes upon the same 
horizontal line represent the cities which spend the same amount of 
time in writing. Those on the upper line spend the least amount 
of time, and those upon the lowest horizontal hue, the lai^esL 
The position of the lines in the right or left direction represents the 
rank which was obtauied by the schools as a result of the test. 
Those which are at the left side of the figure arc higher in rank, 
and those which are toward the right are lower. 

If the spending of a large amount of time in writing produces a 
corresponding gain in efficiency, the vertical lines should begrouped 
along a diagonal line running from the lower left-hand corner to 
the upper right-hand corner. That is, those which spend the less 
amount of time should be toward the right, and vice versa. It is 
evident that tliis situation is not represented by the facts. The 
cities which spend the various amounts of time are scattered 
throughout the range. For example, of the two cities which spend 
on the average only 45 minutes per week, one has the eleventh 
xauk and the other the tn'cnty-sixth; while two of the cities which 

' See FouHeenth Yearbook of Ike National Society for Ihe Study of Educa- 
tion, part r, p. SC. 



<m EDUCATIONAL TESTS AND MEASUREMENTS 1 

Ubmtim vn- Ayenca j 






„ 


KV69 










22.G 



















Bull EiaiGEOKBDSSWtf 
Fio. es. DiSTBmnnoN in Rank of 47 Cmss, abk&nqbd ih 

ClABSEH ACCOBDtNO TO THE TiMB SPENT ON HaNDWBITINQ 

apend an average of 95 minutea have the rank of fofty-lhrce and 
forty-four, very nearly at the bottom of the liat. The average rank 
attained by the cities of each time-group are represented in the 
column to the Hght. It will be seen that, with the exception of the 
shortest-time and the longest-time groups, there is some increase 
in efficiency with an increase in time, but this increase, which 
holds on the average, is slight, and the exceptions are so great that 
the amount of time spent appears to have little influence upon the 
results.' 

Standardized tests furnish a means for the supervisor to 
determine in a scientific manner the optimum time to be 
given to each of the subjects. 

The Cleveland reading results a study in efficiencr. If 
the school is efficient a pupil's progress through the several 
grades must be in accord with a general plan. When the 

' Freeman, F, N. FauHeenlk Yearbook of Ou National SocUlji for the 
Sludi/ of Education, part i, pp. 07-68. M 




TESTS AND SUPERVISION 



897 



teachers are working independently the pupil's progress from 
grade to grade may be very erratic. Figa. 23 and 24 show the 
conditions which were found to exist in silent reading in 
Cleveland, Ohio. These conditions are merely typical of 
many which measurement has revealed. The average scores 















h.,r rit 


ea 












■h 


PObi 


HDd 








-} 


r 










y 


y 






f 










/ 






■• 


















4. 





6 



Pig. 23. Average Scohbb in Spb^ed and Qdalitt of Silbn* 
Rbaoino in Each Grade in Cleveland and in 13 Otheb 
CniEs. Ghat'b Silent Reading Tests osed. 

(Ftam Weinnnt Ui Work qf Ihi PMic Schooli, by C. H. Jndd.} 

are indicated by the positions of the small circles. The nu- 
merals near the circles indicate the grades to which the scores 
belong. If the average scores of thirteen other cities are 
taken as a standard, the lack of satisfactory standards in cer- 
tain grades is indicated for Cleveland. The progress from 
the third grade to the fourth is particularly unsatisfactory. 
However, Fig. 24 gives the moat striking evidence of the 
lack of adequate supervision of the instruction in reading. J 



tS8 EDUCATIONAL TESTS AND MEASUREMENTS 















jWilBOl 




, 


1 












\ 


/ 










. 




\ 


^fr 


ssl. 








/ ■■ 




\ 


/. 


, 








a/ 








-^\ 






ObaenBDn 






■• 


■' 
















3,'" 


Qi 


1 




m 




/ 




»*^ 








'I 




I 





(Fnm liaiuriit} Ihi Wiri of Ot FuHic SelmU, b; C. B. JuddJ 

In any one of the three schools represented the progress of 
the pupils is so erratic that it is obvious that no definite 
plan existed. 

Standards for instruction illustrated from arithmetic. 
What can be accomplished by the setting of definite stand- 
ards for each stage of the pupil's progress and by systematic 
testing is illustrated by the following report: ' 

The Courtis Standard Research Testa, Series B, were given in 
October, 1013, to the pupils in the four schools comprising a achool 
system. The condition revealed was very unsatisfa<rtory. The 
median scores were conspicuously below the standards for these 

* Lane, Henry A. "Standard Tests as an Aid to Supervisioa"; in Ele- 
mentary School Journal, vol. 15, pp. 378-86. 




tests in r 



TESTS AND SUPERVISION 



1 



tests in practically all instances. The progress from grade to grade 
vas irregular. The superintendent readied the conclusion that the 
inefficiency of arithmetic instruction was not due to faulty method 
so much as to the fact that teachers and pupils did not know 
definitely what was expected of them. He therefore placed in the 
hands of each teacher a copy of the following announcement: — 
In June there will be another Courti.s test of the same type as the 
one given in October. At that time, the grades are expected to at- 
tain the following standards; 







ifuUiplic<Him 


«uiD^^ 




AUempii 


SigMl 


All^pU 


JMffW. 








4.0 

4.9 
5.8 
6.6 
7.S 
8.4 
B.3 
10. 1 
11,0 
12,0 






6 
7 
8 

a 



10 


7 
5 
3 

6 
6 
4 
2 


t 

3 
4 
S 
6 
7 
8 
9 


9 

a 

' 
6 
5 
4 

^ 

















































Work in the four operations must be stressed. These are tenta- 
tive and minimum standards. They will very likely be raised 
next fall. The success with which classes achieve these standards 
will, in a sense, be a measure of teaching ability. 



When the tests were repeated at the close of the school 
it was found that "with very few exceptions the standards 
set were attained, and in some cases they were exceeded."' 

Results of using the Courtis tests in Boston. In certain 
cities tests have been used long enough to show the effect 

1 Id ualng this illu.strBtinn the writf r assumes no responsibility For the 
standards used, or for the superiDteudent Failing to realize the possibility 
that these standards might be lowered rather than raised. 




806 EDUCATIONAL TESTS AND MEASUREMENTS 

of systematic measurement. In Boston the Courtis Stand- 
ard Research Tests have been used in 29 schools since 1912. 
In the following statement these are called group A schools. 
Seventeen schools. Group B, have used the tests for one 
to two years. In seventeen other schools, Group C, the tests 
were given for the first time in May, 1915. In comparing the 
achievemeDts of the pupib in these three groups of schools 
on the basis of the scores made in May, 1915, Ballou says ; — 

1. In the amount o( work done by pupils in the four fundamental 
operations, Group A schools show superiority over Group B 
schools in sixteen out of twenty comparisons, and over Group C 
schools in eighteen out of twenty comparisons, 

i. In the accuracy with which work was done, Group A schools 
show superiority over both Group B and Group C schools in 
seventeen out of twenty comparisons.' 

The supervisor and tiie stajidardized tests. Methods and 
devices of instruction must be judged by results. The super- 
visor can also use standardized tests in guiding teachers in 
the determination of the best methods and devices of in- 
struction. When a test is given at the middle or close of the 
year the effectiveness of the instruction is indicated by the 
results. It wilt frequently happen that certain teachers are 
securing superior results. This is an indication that these 
teachers are using methods or devices of instruction which 
are superior to those employed by the other teachers. It is 
also possible to evaluate scientifically proposed methods and 
devices of instruction. 

Formerly the duties and responsibilities of the supervisory 
ofGcials were limited, for the most part, to enforcing disci- 

' Ballou. Frank W, "Improving lostruclioD through Educational 
Measurenient"; in Edticatiimal Adminiiiration and Sajiemmon, vol. 8. 
pp. 354-67. 



J 



TESTS AND SUPERVISION 

pline and to performing the clerical duties of their offices. 
A somewhat incidental duty was the supervision of instruc- 
tion. Usually this meant assisting the inexperienced and 
less capable teachers. This was accomplished in two ways. 
The principal or superintendent visited the classroom and 
made note of such deficiencies as he thought important. 
These items were discussed with the teacher, remedies for 
the shortcomings being suggested. Another metliod was to 
take charge of the class and show the teacher how the work 
was to be done. This supervision was necessarily personal. 
It was the opinion of the supervisor against the opinion of 
the teacher. Standardized tests make possible adifferent tyi>e 
of supervision of instruction. By their use supervision be- 
comes impersonal, it is no longer the opinion of the super- 
visor against the teacher. Both must submit to facts. By the 
use of standardized tests the supervision may be made 
scientific and one of the most significant results will be the 
development of a scientific attitude on the part of the teacher 
towards her work.^ 

Standardized tests provide a means whereby a supervisor 
may render an accounting to the citizens of the community. 
If the business meu question the quality of the product of the 
schools, he can present facts to show the quahty as com- 
pared with that of other cities. Superintendents have testi- 
fied that standardized tests would be worth while if they 
did nothing more than furnish an effective reply to the 
chronic faultfinders. 

' Reiul in this connection Morriaiin, J. Cayce. "The Supervisor's 
Use of Standard Teats of Efficiency"; in ElenerUary School Journal, vol. 
17, pp. 335-04. 



302 EDUCATIONAL TESTS AND MEASUREMENTS 

But standardized tests do more. Their use eliminates 
"guesswork." The supervisor may now think about his 
work in the same type of terms as the manufacturer uses. 
He now has objectively defined units of achievement which 
he may use. The interpretation to be placed upon a given 
degree of achievement is not a matter of personal opinion. 
Standards furnish an im[>ersonal basis which must be rec- 
ognized by all. The value of being able to think about one's 
work in terms of facts and objective units, instead of in 
terms of opinions and vague terms, may not be immedi- 
ately apparent. In the business world it is represented by 
the difference between success and failure. It is difficult to 
conceive any reason why the same significance should not 
exist in the field of education. 

QUESTIONS AND TOPICS FOR INVESTIGATION 

1. What ore the steps in supervising iostnictian with sUndardbed tests? 

2. Should teachers use standardized tests if the fourth step is not takenP 
Why? 

3. Why must the supervisor assist the teacher in this work? 

4. Why is a geuersl aim not sufficient? 

6. What are the objectious to our present courses of study with respect 

to the statement of aim? 
6. How can standardized tests be used in setting the aim for ike teacher? 
T. Summarize the uses which a supervisor can make of standardized 

tests. 



1 



INDEX 



Abilit?, relation to perfc 

Algebrft. problem of meo-siiremenl 
in, iH; Fundumeiital uperatioas 
of, 823; remedial iiutructioa in, 
831 ff. 

Algebrft TOEta: Standard Research 
Teata in Algebra, 888. 831 ; Thorn- 
dike's Atgebm Tests, itS; IndUnn 
Algebrs Tests. 289, 839; Standanl- 
ized Testa in First-yeur Algebra, 
289,839. 

Analysis of ability in Brithmetie, 19 
ff., 37. 

Anderson, H. W., 111. 

ArithmstiG: problem of mensure- 
inent, 17 fF.; types of examples, 19; 
laws of habit formation applied, 
i53 ff.; remedial instruction, 53 B.; 
individual differences, 55 ff.; de- 
vices tor remedial instruction, 
figff. 

ArithiQBtlcEd Tests : Courtis Stand- 
ard Research Tests, Series B, 
23. 38; Cleveland Survey Teata, 
25, 40: Woody Arithmetic Scales. 
89, 41; addition of fraetions, 31, 
48; Stone's Reasoning Test, 3S. 
42; Starch's Reasoning Test, 37; 
Courtis's Reasoning Tests. 37. 

Arithmetical abilities, nature of, 
18 ff. 

Ashbaugh, E. J., 171, 172, 173. 

Average, 847. 

Average deviation, 247. 

Ayres. L. P., 15, 113, 116, 186, 148, 
152, 166, 173. 

Ayres's Handwriting Scale, 158; 
Adult Scale. 158; "Get^sburg 
Edition," 158. 

Ayres's Spelling Scale, 113-18; 121. 

Ballou, F. W^ la, 34, 40, 1»4, 279, 



£8S. \ 

r»nrpn fnr fTmn^n ' 



Bell. J, C. 833. 

Hobbitt, Franklin, 889. 

Boston, standard scores for Courtis 
Standard Research Testa, Scries 
U, 40: standard scores For addition 
of fractions, 42; results of using 
Courtis Tests, 8B«; copying teat. 
21S. 

Breed, F. S.. 181. 194. 

Breed and Down's Handwriting 
Scale, I S3. 

Breed and Frostic Composition 
S<.'ale, 195. 

Brown, H. A.. 74. 

Brown's Silent Reading Test. 74-76; 
87-89. 

Bmwnell, Baker, 104. 200. 

Buckingham, B. R., 115, 875. 

Buckingham's Spelling Scale, 124. 

Buckingham's Grammar Teat. 215. 

Butte, Montana, scores for Stone's 
Reasoning Test. 43-14. 

Carter, R. E., 3. 

Central tendencies: median, 248, 
816; average, 247; mode, 247. 

Charters. W. W., IS, 187. 281. 291, 

Childs, H. G., 231. 839. 

Clark, J. R.. 887. 232, 27S. 

Class instruction, waste of time in, 
55; eHecta of, in arithmetic, 58; 
modified, 59. 

Cleveland Survey Arithmetic Tests, 
85: standards, 40-11: as instru- 
ments for disgoosis, 50 ff. 

Coefficient of correlation, 251. 

Composition Scales: Hiltegas 
Scale, 191, 199, 800; Thorndike 
extension of Hillegas Scale. 195; 
Harvard-Newton Scale. 195, 200, 
204; Breed and Frostic Pcjile, 195; 
Willing Scale, 198, 804, 206; Nas- 
sau County Supplement, 196, 



Com|Nigition, rdlability of 

ment in. 196 ff. 
Copyiog Teat, SIS R.; kinds of errors 

DuulclilT. 

CotreUtion, 150. 

Course of study, 289 B. 

Courlis. S. A., 19, 37, 39. 18, 60, 73, 

81, liO, 140. 

Courtis's English Teala, 73-74. 

Courtis iStandard Research Teats, 
Series B, 23, 56; standards, 40; aa 
instruments for diagnosis, 50. 

Courtis's Silent Reading Teste, 81- 

82, 89. 

Courtis's Standard Practice Tests ia 

AriUimetic. 60; in Spelling, 140. 
Criteria for evaluating testa, 85-80. 
Gulp, Vernon, 164. 
Cycle Principle, 275. 

Davidson, P. E., Ill, 

Diagnosis: inalgebra, 232; in arith- 
metic, 32, 4S S.; in handwriting. 
158 B.; in reading, 93 ff., 97; in 
spelling, 130-35. 

Efficiency. deSnition of, 264. 

Elliot, E. C, 6. 

Evaluation ot eiercises: per cent of 
pupils solving basis, iCTfl; opinion 
of competent judges, 278; teacher- 
judgment, 279. 

Evaluation of reading tests, 85-90. 

Ezaniiiiatloni: sources of error in, 
8 ff.; preparation of questions, 
8 ff.; value ot questions, 8 ff., 87S 
S,; marking of papers, 5S., !80 ff. 

Example vs. problem, 22. 

Examples, types of, io arithmetic, 19. 

Fordyce. Charles. 120. 

Pomgn languages: Starch's tests. 

234; Hanus Latin tests. 235; Hen- 

mon Latin tests. 235. 
Fractions, teats in, 34; standards, 42. 
Freeman. F. N., 135. 196, 100, 161, 

168. 169, 171, 172, 170. 296. 
Freeman's Handwriting Scale, 1S3- 

64; iaO-02. 
Froslie, F. W., 194. 



Oeomstr;, test in, 233. 

GillUand. A, R.. 111. 

Qranunar, measurement of ability 

in, 212. 215. 
Grand Rapids. Michigan, scores for 

Cleveland Survey arithmetic tests, 

41. 
Graphical r«>re8enlatlon. 253 ff. 
Graves, S. Monnie, 181. 
Gray. C. T., 164, 163, 107. 
Gray. W. S., 78. 83. 
Gray's Handnriting Score Cord. 154- 

S6. 158-59. 
Gray's Oral Beading Test. S3-S5, 

Gray's SUent Reading Tests, 78-79, 



Habit Formation: laws of. applied 
to arithmetic-. 53 ff.; to spell- 
ing, 135-36, 140; to handwriting. 



Handwriting: probli 

ment. 146-16; securing Bpecimen*. 
147-48; methods of using scales. 
156-58; diagnosis, 15S-62; using 
Freeman's scale. 161-62; reliabil- 
ity of scores, 168-64; training in 
using scales, 165-67; standards, 
168-74; individual differences, 
175 ff.; nature of ability, 175; re- 
medial instruction. 180 ff.; systems 
of penmanship compared, 181; 
movement. 181 ; rhythm, 182, 186; 
speed. 182: laws of habit formoi- 
tion, 183 S.; speed and quality, 
183; devices of remedial instruc- 
tion, 184 ff.; motivation of prac- 
tice, 187, 

Handwriting Scales: Thorndike, 
148; Ayre3,152; Johnson andStone, 
152-53; Breed and Downs, 153; 
Freeman, 153-54; Gray, 154-56. 

Hanus, Paul, 235. 

Harvard-Newton Composition Scale, 
105. 270; directions for using, 200; 
slandards, 204. 



g 



Harvey. Nathan A.. 164. 

Hillegas, M. B,. lOi, 878. 

Eillegas Composition Scale, 194. 

2TH; directions for usiog, 190: 

standnrds, 200. 
Houser, J. D., 1*3. 
Hudelaon, Earl, Itfl. 
Hurt. A. O., 160. 

lodiTidual difterenceB: arithme- 
tic 55 3.; silent reading, 105; 
Bpelliog. 116-17, 194-35; hand- 
writing, 175 B, 

IndlviduBl Instruction: arithme- 
tic. 69 ff.: handwriting, 175; apell- 
ing, 130 ff. 

Inglis, Alexander, S. 

Intelligence Tests, 105. 

Johnson, F. W.. 4. 222. 

Johnson, Harry, 164. 

Johnson, J. H., 107-69. 

Johnson and Stone's Handwritine 

Scale. 152. 153. 
Jones. N.F., 114, 132-^33. 
Jones, R. G., 82. 87. 
Jones' Visual Vocabulary Tests, 82. 

87, 89. 

Judd, C. H., 20, 171, 172. 

Kansas Silent Reading Tests, 79-81, 

88-89, 91-92, 99. 
Kayfete, Isidore, 194. 
Kelly. F. J., 4, 79, 163, 197, 280. 
King, Irving, 1S4. 
King, W. I.. 240, 247. 

Lane, H. A., 298. 

language: problem oF measure- 
ment, 192; measurement of ability 
by completion-tests, 210: value at 
testa, 218 ff.; analysis of ability, 
220. 

LawB of bablt farmaitlDn: applied 
to arithmetic. 53 ff.; applied to 
spelling, 135-36, 140; apphed to 
handwriting, 183 ff. 

Lewis, E. E.. 103, 174. 

Lu^ H. G., 139. 



Manuel. H.T.. 164, 191. 

Median, 242, 246. 

Minimum essentials. 15. 

Minnesota Scale lieU, 73, 80-87. 

Mode, 247. 

Monroe, Waller S.. 37. 223, 228. 

Morrison, J. C 301. 

Movement in handwriting. 181. 



Oral Reading Teats: Jones' visual 

vocabulary tests, 92; Hag^ty's 
visual vocabulary tests, 83; Gray 'a 
Oral Reading Test, 83-85. 
Oral reading, overemphasis on, 09- 
100. 



Percentiles, 249. 

Perfonnance. Sea Ability, 

PhjBiCB, test in, 237. 

Pinter, R., Ill, 167. 

Practice tests jn arithmetic, 60 ff.; 
in spelling, 140-41. 

Probable error, 249. 

Problem vs. example, 22. 

PrDblem of measurement: arith- 
metic, 17 ff.; reading. 60; spelling, 
112; hund«Titing. 145-40; lan- 
guage, 192; algebra, 224. 

PupU, value of tests to, 95 96. 



QuarUles. 249. 



66; 



. 97; r 



dial instruction, 96-107. 
Reading tests. See Oral reading, tx 

?aleat reading. 
Reading tests evaluated. 85, 90. 
Reasoning tests in arithmetic, 36. 
Reliability of meaaureB, 2; in 

ariUimetic. 44 ft.; iareHJiing.87ff.; 

in spelling, 116 fl.; in handwriting, 

162 ff.; in composition, 190. 
Remedial Instruction, 2ST ff.; in 



arithmetic, SSff.; in reading. Mff,; 
in !ipeiliiig, 135 fS.: m bandwritiag. 
IBU B.-. in algebra, 231 S. 

Rhytbta, in hBodwriting, ISB; devel- 
opment of, 186. 

Ri<«. J. M.. 144. 

Ricliards.A. M.. 111. 

Rugg, H. O., 224, 227, !3e. WS. 

Sackrtt, L. P., 169. 

SalL Lake City, Dtab, scores for 
Stone's Reasoniog Ttst, M. 

School muks, inaccuracy d, 1 B.; 
indefiniteness of, US. 

School marks c». scores, 259. 

Scientific management, principles of, 
46 ff.; deSned, 994 ff. 

Scores, accuracy of: arithrocticT 44 
ft.; spelling, 116; handwriting, 1S2; 
composition, 196. 

Scores, translation of, 260 8. 

Scars, J. B., 134. 

Silent Beading Tests : Thomdike's 
Scale Alpha, 71-73; Minnesota 
Scale Beta. 73; CourtU'a Jinglish 
Testa, 7»-7*; Brown's Silent 
Heading Test, 74-^70; Starch's Si- 
lent Reading Tests. 7ft-78-. Gray's 
Silent Reading Testa, 78-79; Kan- 
sas Silent Reading Tests. 79-Sl; 
Courtis's Silent Reading Tests, 
81-82. 

Smith, James H.. Sfh6S. 

Speed and quality, 183. 

Speed in handwriting, IBS. 

Spelling demons, 132-33. 

SpelUng: definition of ability, IIB- 
13. 125; problem of measurement, 
113; making a spelling test. 114- 
24; individual differences. 118-17, 
134-3S; number of words to use, 
118-10; a timed sentence test, 12B- 
24; diagnosis, 130-35; types of 
errors, 13S-35; remedial instruc- 
tion. 135-42; causes of errors, 136- 
38; devices for teaching, 138-42; 
practice tests, 140-41. 

Standardized tests, use in supcrvi- 
sLon. 284 ff.; results of using. 299 ff. 

Standards: Courtis Standard Re- 



search Tests. Series B. 40; Cleve- 
land Survey Arithmetic Tests, 40- 
41: Woody's ArithmelicScales,42; 
Addition of Fractions, 42; Stone's 
Reasoning TesU 42 ff.; Thom- 
dike's Visual Vocabulary Scale, A, 
69: Thomdike's Scale, Alpha, 73; 
Brown's Silent Reading Test. 76; 
Starch's Silent Reading Tests, 78; 
Gray's Silent Reading Teats, 79; 
Kaiisiu Silent Reading Testa, 81, 
92, 99; Ayres's Spelling Scale, 128- 
30; Starch's Spelling Scale, 130; 
Handwriting Scales, 168-74; Hille. 
gas's Composition Scale, 800; Har- 
vard-Newton CompoMtion Scale. 
2M; Trabue's Completion-Test 
Language Scales, 212; Slardi's 
Grammatical Scales, 213: Copying 
Test, 217; Standard Research 
Tests in Algebra, 231 , 

Standards, basis of, 263 fi.; ^rpea 
of. *70; use of. 293 ff. 

Starch, Daniel. 0. 37, 70. 76. liM, 
169, 171, 172. 212. 234, 837. 

Starch's Graromatical Scales, 212 ff. 

Starch's Sifcnt Reading Tests, 76-78, 
87, 88-8S. 

Starch's Spelling Scales, 12fi-!8, 13a 

Stockard, L. V„ 233. 

Stone, C. R., 187. 

Stone, C. W., 18, 42. 

Stone's Reasoning Test, 39, 42. 

Studebaka-, i. W.. 62. 

Studebaker Economy Practice Estx- 
cisea,6i. 

Superintendent, value of tests to, BO- 
93. !84S. 

Supervision, steps in, 285 ff. 



Teacher, Value of tests to, BS-B5. 
TcBcbers, specifications for, 288. 
Terman, L. M., 105. 
Tests, derivation of. 30, 34. 873 ff. 
Thorndike, E. L.. 71, 118, 125, 148, 
165, 194, 197, 240, 247, 249, 3£0. 



1,279. 



Thomdike'a Extension of I 

Scale, 19S. 
Thorndike'a Handwriting Scale, 

lis. 

Thoradike's Reiiding Scale, Alpha, 

71-73. ST. 
Thoradike's Visual Vocabulary 

Scale, 67. 
TidymBU, W. F., 185. 
Trabue, M. R., 194, 197, ZIO, 875. 
Trabue's Completion-Teat Language 

Scale, eiO ff., i\i. 



Variability, 

viation, 247; percentiles, quartiles, 

probable error, 819. 
Vocabuluy Scales; Thorndike'a 



70. 

Wallin, J. E. W.. lU. 

Wnssen, Alfred W., 90. 

WUling, M. H., 194. 

Willing'fl Composition Scale, 1B8; 1 
directions for using. S04; stand- 
ards. «oe; scale reproduced, 206 ff. 

Wilson, G. M., 172, 187. 

Wilson, H. B., 187. 

Wilham. E. C. 172. 

Woody, Clifford, 29, 275. 

Woody's Arithmetic Scales, !0, 42; 
as an instrument for diagnosis, S3, 



Ziedler, Richard, 111. 



■iiiiii 

3 6105 033 373 650 



311 






|a 



2l 



